Tuesday, January 28, 2025

The DeepSeek breakthrough – What’s it mean? [On the difference between engineering and science]

Frankly, at the moment I’m inclined to think it means that Silicon Valley just got handed its lunch. Strutting around about AGI this, $500 billion that... Voilà! We’re Masters of the Universe. They may or may not be “masters of their domain,” (likely not) in the Seinfeldian sense, but Masters of the Universe they are NOT.

Engineering automobiles, or rockets – that’s one thing. Engineering artificial minds. What’s going on is unprincipled hacking. Throw enough person-hours, compute, and money at it and, sure, you’ll do something....

Mutt: “Slow down, son, you’re ranting!”

Jeff: “OK, OK, I’ll slow down.”

Full disclosure: My priors

I was trained in computational semantics back in the mid-1970s by David Hays, who had been a first-generation researcher in what was originally called machine translation (MT) but got rebranded as computational linguistics (CL) in the mid-1960s when it got defunded for failing to deliver practical benefit to the US Military. At the time I worked with him Hays had shifted his attention to semantics. In keeping with that focus he read a great deal about cognitive and perceptual psychology and neuroscience. He wanted his models to have some grounding in scientific fact.

He tended to think of AI researcher as unprincipled hackers. If the program worked, that’s all that mattered. They didn’t seem very interested in possible psychological reality.

And that’s what the current regime of work on deep learning looks like to me: hacking. To be sure, it’s brilliant, perhaps inspired hacking. If I didn’t believe that I wouldn’t have spent a great deal of my time over the last two years working with ChatGPT and now Claude 3.5, working to tease out clues about what’s going on under the hood.

That’s a problem, no one actually knows how these models work. Oh, there’s interesting research on that problem, some of it under the rubric of mechanistic interpretability. But that doesn’t seem to be a priority. Instead, the emphasis is on scaling up, more data, more compute, more parameters, more more more! (Slow down son!)

The impact of DeepSeek

Given that scaling up has worked in the past, and in the absence of any deep insight into how these things work, the scaling hypothesis, as it is sometimes called, had a certain superficial validity. The Chinese have just blown a big hole in the scaling hypothesis. Here’s Kevin Roose in The New York Times:

The first is the assumption that in order to build cutting-edge A.I. models, you need to spend huge amounts of money on powerful chips and data centers.

It’s hard to overstate how foundational this dogma has become. Companies like Microsoft, Meta and Google have already spent tens of billions of dollars building out the infrastructure they thought was needed to build and run next-generation A.I. models. They plan to spend tens of billions more — or, in the case of OpenAI, as much as $500 billion through a joint venture with Oracle and SoftBank that was announced last week.

DeepSeek appears to have spent a small fraction of that building R1. [...] But even if R1 cost 10 times more to train than DeepSeek claims, and even if you factor in other costs they may have excluded, like engineer salaries or the costs of doing basic research, it would still be orders of magnitude less than what American A.I. companies are spending to develop their most capable models. [...]

But DeepSeek’s breakthrough on cost challenges the “bigger is better” narrative that has driven the A.I. arms race in recent years by showing that relatively small models, when trained properly, can match or exceed the performance of much bigger models.

What that means is that the industry’s intuitive understanding of what the late Dan Dennett liked to call the design space for AI, that understanding is wrong. Yes, bigger does sometimes/often get you more performance. But if smaller can yield comparable performance, than something else is going on, something we can’t identify.

Were the Chinese just lucky? Or do they know something, something deep, that we don’t? In the absence of any further information, I’d guess that it’s both.

Can we figure out what they did and do it ourselves? Sure, no problem. DeepSeek is an open-source model and the researchers have released good documentation. We’ll replicate what they’ve done and perhaps improve on it, as they will also do. That’s not the issue.

The issue is understanding. We already knew that these models are black boxes. And we guessed/hoped that buiding a bigger box would make it better. We now know that that’s not necessarily true. What else don’t we know? How are we going to find out?

Let me once again trot out my favorite analogy: The current so-called “AI Revolution” seems like a whaling venture where the crew and caption know all there is to know about their ship and how to sail, but they don’t know much about whales. So, when they fail to find and kill any whales, what do they do? The try to figure out how to get better performance out of their ship. They don’t seem to understand that, if you’re going to hunt whales, you need to understand how whales behave. Yes, a good ship is important, so is seamanship. But they’re not worth much without a knowledge of whales and their behavior. [ChatGPT and I have fun with that analogy in this post on benchmarks.]

Engineering and science are very different

Both may be highly technical these days, but their goals and methods are different. Engineers design and build things. Scientists seek understanding of how things work. Engineers need to know how things work, up to a point, but they don’t necessarily need to have deep understanding.

Look at all the mega-engineering projects of the ancient world, the pyramids, the aqueducts, the walled cities, and so forth. The engineers who designed and built those things didn’t know anything about gravity. Didn’t need to.

Now consider automobiles. We know a great deal about how to design and construct automobiles. We know what makes them go faster, how to conserve fuel, how to stop quickly, how to make sharp turns, and so forth and so on. We know about the trade-offs that exist among various design objectives. And we understand the basic science underlying the operation and performance of automobiles, the physics, chemistry, and material science.

The same is nearly true for sending vehicles into space. We’ve launched all kinds of satellites into orbit, sent probes to the outer planets and headed toward the stars. We’ve landed humans on the moon, and landed various vehicles on Mars. We know the chemistry and physics of getting humans to Mars as well, though we’re still a bit sketchy on the biology and medical issues.

The situation is very different with deep learning. The situation is different because we don’t know what happens under the hood. Engineering tells us how to build models, but it doesn’t tell us how those models operate once they’ve been built. And engineering isn’t going to give us that information, because that’s NOT the kind of thing engineering is for. For that we need science. More science. Instead of dropping $500 billion on building powerplants how about cutting that back to $400 billion and putting $100 billion into scientific research on how these models work, research that will necessarily involve research in human cognition and brain operation as well. Why? Because the human system is what we’re trying to emulate.

Once we learn more about how these models operate, then and only then will we be able to engineer new and more effective architectures. If we don’t learn more about how these models operate, we’ll just be throwing good money after bad. It’s time to put more science into “computer science.”

No comments:

Post a Comment