NEW SAVANNA: What is AI alignment about?

Thursday, March 3, 2022

What is AI alignment about?

There are people who believe that sooner or later artificially intelligent systems will have human level intelligence. Some of those believe this also fear that such systems might very well go rogue act in ways harmful to humans and perhaps even to humanity as a whole. That, more or less, is what is mean by the AI alignment problem. While I’ve known about this for some time, I don’t follow those discussions because I’m skeptical about the eventual development or, as the case may be, emergence of such systems.

But, on general principles, I do dip into those discussions from time to time. One of those discussions is taking place at Lesswrong, Late 2021 MIRI Conversations: AMA / Discussion. This particular discussion takes place at the end of a series of discussions that has been going on since late 2021. I’ve read a bits and pieces of those discussions, but no more. Anyhow, deep into this particular discussion Rob Bensinger compiled a list of “important questions people in the field seem to disagree a lot about”). I’m parking it here for future reference.

Alignment

How hard is alignment? What are the central obstacles? What kind of difficulty is it? (Is it hard like 'building a secure OS that works on the first try'? Hard like 'the engineering/logistics/implementation portion of the Manhattan Project'? Both? Some other option? Etc.)
What alignment research directions are potentially useful, and what plans for developing aligned AGI [artificial general intelligence] have a chance of working?

Deployment

What should the first AGI systems be aligned to do?
To what extent should we be thinking of "large disruptive act that upends the gameboard", versus "slow moderate roll-out of regulations and agreements across a few large actors"?

Information spread

How important is research closure and opsec for capabilities-synergistic ideas? (Now, later, in the endgame, etc.)

Path to AGI

Is AGI just "current SotA systems like GPT-3, but scaled up", or are we missing key insights?
More broadly, what's the relationship between current approaches and AGI?
How software- and/or hardware-bottlenecked are we on AGI?
How compute- and/or data-efficient will AGI systems be?
How far off is AGI? How possible is it to time future tech developments? How continuous is progress likely to be?
How likely is it that AGI is in-paradigm for deep learning?
If AGI comes from a new paradigm, how likely is it that it arises late in the paradigm (when the relevant approach is deployed at scale in large corporations) versus early (when a few fringe people are playing with the idea)?
Should we expect warning shots? Would warning shots make a difference, and if so, would they be helpful or harmful?
To what extent are there meaningfully different paths to AGI, versus just one path? How possible (and how desirable) is it to change which path humanity follows to get to AGI?

Actors

How likely is it that AGI is first developed by a large established org, versus a small startup-y org, versus an academic group, versus a government?
How likely is it that governments play a role at all? What role would be desirable, if any? How tractable is it to try to get governments to play a good role (rather than a bad role), and/or to try to get governments to play a role at all (rather than no role)?

NEW SAVANNA

Pages in this blog

Thursday, March 3, 2022

What is AI alignment about?

No comments:

Post a Comment