Planned Obsolescence (Page 2)

Planned Obsolescence

Sign in Subscribe

More posts

Alignment researchers disagree a lot

Many fellow alignment researchers may be operating under radically different assumptions from you.

Training AIs to help us align AIs

If we can accurately recognize good performance on alignment, we could elicit lots of useful alignment work from our models, even if they're playing the training game.

Playing the training game

We're creating incentives for AI systems to make their behavior look as desirable as possible, while intentionally disregarding human intent when that conflicts with maximizing reward.

Situational awareness

AI systems that have a precise understanding of how they’ll be evaluated and what behavior we want them to display will earn more reward than AI systems that don’t.

"Aligned" shouldn't be a synonym for "good"

Perfect alignment just means that AI systems won’t want to deliberately disregard their designers' intent; it's not enough to ensure AI is good for the world.

What we're doing here

We’re trying to think ahead to a possible future in which AI is making all the most important decisions.