Ethan Mollick has an interesting post at One Useful Thing: Superhuman? His third paragraph:
No matter what happens next, today, as anyone who uses AI knows, we do not have an AI that does every task better than a human, or even most tasks. But that doesn’t mean that AI hasn’t achieved superhuman levels of performance in some surprisingly complex jobs, at least if we define superhuman as better than most humans, or even most experts.
He goes on to review various tests and benchmarks, concluding in aggregate:
So there does seem to be some underlying ability of AI captured in many different measures, and when you combine those measures over time, you see a similar pattern - everything is moving up and to the right, approaching, often exceeding human level performance.
Zoom out, and the pattern is clear. Across a wide range of benchmarks, as flawed as they are, AI ability gains have been rapid, quickly exceeding human-level performance.
The following remarks are from the section, Alien vs. Human:
The increasing ability of AI to beat humans across a range of benchmarks is a sign of superhuman ability, but also requires some cautious interpretation. AIs are very good at some tasks, and very bad at others. When they can do something well - including very complex tasks like diagnosing disease, persuading a human in a debate, or parsing a legal contract - they are likely to increase rapidly in ability to reach superhuman levels. But related tasks that human lawyers and doctors perform may be completely outside of the abilities of LLMs. The right analogy for AI is not humans, but an alien intelligence with a distinct set of capabilities and limitations. Just because it exceeds human ability at one task doesn’t mean it can do all related work at human level. Although AIs and humans can perform some similar tasks, the underlying “cognitive” processes are fundamentally different.
What this suggests is that the AGI standard of “a machine that can do any task better than a human” may both blind us to areas where AI is already better than a human, and also make humans seem more replaceable than we are. Until LLMs get much better, having a human working as a co-intelligence with AI is going to be necessary in many cases. We might want to think of the development of AGI in tiers:
Tier 1: AGI: “a machine that can do any task better than a human.”
Tier 2: Weak AGI: at this level, a machine beats an average human expert at all the tasks in their job, but only for some jobs. There is no current Weak AGI system in the wild but keep your eyes on some aspects of legal work, some types of coaching, and customer service.
Tier 3: Artificial Focused Intelligence: AIs beat an average human expert at a clearly defined, important, and intellectually challenging task. Once AI reaches this level, you would rather consult an AI to get help with this matter than a random expert, though the best performing humans would still exceed an AI. We are likely already here for aspects of medicine, writing, law, consulting, and a variety of other fields. The problem is that a lack of clear specialized benchmarks and studies means that we don’t have good comparisons with humans to base our assessments of AI on.
Tier 4: Co-Intelligence: Humans working with AI often exceed the best performance of either alone. When used properly, AI is a tool, our first general-purpose way of improving intellectual performance. It can directly help us come up with new strategies and approaches, or just provide a sounding board for our thoughts. I suspect that there are very few cognitively demanding jobs where AI cannot be of some use, even if it just to bounce ideas off of.
I think Tier 3 and Tier 4 is where we’re headed. He goes on to remark:
Even though tests and benchmarks are flawed, they still show us the rapid improvement in AI abilities. I do not know how long co-intelligence will dominate over AI agents working independently, because in some areas, like diagnosing complex diseases, it appears that adding human judgement actually lowers decision-making ability relative to AI alone. We need expert-established benchmarks across fields (not just coding) to get a better understanding of how these AI abilities are evolving. I would love to see large-scale efforts to measure AI abilities across academic and professional disciplines, because that may be the only way to get a sense of when we are approaching AGI.
Here's a comment I made in response to the post:
For what it's worth, I strongly suspect AI experts are, shall we say, a bit naive in how they think about human ability and put far too much stock in all those benchmarks originally designed to gauge human ability. Those tests were designed to differentiate between humans in a way that's easy to measure. And that's not necessarily a way to probe human ability deeply. Rodney Brooks on The Seven Deadly Sins of Predicting the Future of AI has some interesting remarks on performance and competence that are germane.
I've written an article in which I express skepticism about that ability of AI "expert" to gauge human ability: Aye Aye, Cap’n! Investing in AI is like buying shares in a whaling voyage captained by a man who knows all about ships and little about whales.
More recently, I've taken a look at analogical reasoning, which Geoffrey Hinton seems to think will confer some advantage on AIs because they know so much more than we do. And, yes, there's an obvious and important way in which they DO know so much more than individual humans. But identifying and explicating intellectually fruitful analogies is something else. That's what I explore here: Intelligence, A.I. and analogy: Jaws & Girard, kumquats & MiGs, double-entry bookkeeping & supply and demand.
And here's a comment I'm about to post:
On competence "across academic and professional disciplines," I've been interesting in ChatGPT's ability at interpreting films. Here's a piece where I manage to prompt it to a high-school level interpretation of a film: Conversing with ChatGPT about Jaws, Mimetic Desire, and Sacrifice. I hazard to guess at what will be required for an AI to reach professional level, and if we're talking about film interpretation rather than interpreting novels and poems, well, that will require an AI that can actually watch and understand what's happening in a film. I have no idea what that will require.
More recently I've considered the question of using an AI to determine the formal structure of literary texts. It's not rocket science, but it's tricky because it requires a kind of "free-floating" analytic awareness. I'm not sure how well an AI can approximate that with lots of compute.
No comments:
Post a Comment