Tyler Cowen has just posted a section from his current Bloomberg column: Why AI will not create unimaginable fortunes. I responded:
Interesting. I wonder what the lifetime is going to be for LLMs? Didn't I just read a tweet stream the suggested it might cost a billion dollars to train one in the not-so-distant future?* That doesn't strike me as being very sustainable. Geoffrey Hinton has speculated that we'll have neuromorphic computers in the future. They'll take much less power: "It'll be used for putting something else: It'll be used for putting something like GPT-3 in your toaster for one dollar, so running on a few watts, you can have a conversation with your toaster." I don't recall if he offered a time horizon, but I don't think so. I do think, though, that he is more or less right about that. So, What comes first: neuromorphic GPT-3 in your toaster or a $10B training regime for GPT-42? What about a neuromorphic gardner?
I think Tyler's right about this: "AI services will enter almost everyone’s workflow and percolate through the entire economy. Everyone will be wealthier, most of all the workers and consumers who use the thing." At least about the percolate. As for wealth, who knows? Of course, "wealth" is a capacious concept. So again, who knows?
Andreessen has speculated that AI will migrate from being a feature bolted onto a product (like Sydney and Bing – soon to be a situation comedy, "The Honeymooners") to being the foundation of products. I think that's right, and speculated in that direction over a decade ago. What about the operating system? But, who wants an AI that someone else owns to control the operating system of their computer? Maybe it'll be one of Hinton's autonomous neuromorphic AIs.
In the end Gary Marcus is surely going to see a robust symbolic component integrated into these so-called foundation models. What's the time course on that going to be?
I mean, in a way, I guess the question I'm posing is something like this: Have we just entered a civilizational singularity, in von Neumann's phrase:
centered on the accelerating progress of technology and changes in the mode of human life, which gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.
To what extent is Tyler's sense of the future, or mine, or yours, or anyone else's, to what extent are our ideas dominated by the pre-singularity world in which we've grown up? If the world is changing fundamentally, how can we possibly guestimate what's coming down the pike? And we're ALL – Andreessen, Sam Altman, Geoffrey Hinton, Eliezer Yudkowsky, the whole kit and caboodle – in the same antique boat.
Unless AI is severely constrained or even wiped out, our successors will be living in a world that is very different from ours. Everything will be changed: business, governance, work and leisure, everything. We’re going to see new institutional forms.
And so forth.
But it’s all limited by how fast humans can change: What about the ability of one generation to raise a cohort whose sense of the world is fundamentally different from theirs? What does generational interaction over the last half-century tell us about that, if anything? An old post, The Demise of Deconstruction, speaks to that in the intellectual sphere. This post looks at generational succerssion in the world of novels, The Direction of Cultural Evolution, Macroanalysis at 3 Quarks Daily.
See also my recent post, What are the 10–20 year prospects for AI? Three paragraphs from the beginning:
My friend in venture capital, Sean O’Sullivan (who was my boss at MapInfo in the ancient days), tells me there are three time-horizons: 3 months, 12 months, and three years. So, in talking about 10-20 years I’m way out over the end of my skis. That’s fine. But let’s begin by looking at those near-term prospects, the ones on which money is ventured – and lost or gained. If we set the clock at November 31, 2022, when ChatGPT was released to the public, then we are over 2/3 of the way into the first time-horizon. What has happened?
WOOSH!!! That’s what.
The public at large is more aware of AI than they have been before. In particular, the number of people who have been able to interact directly with an advanced AI (as opposed to Siri, Alex, and the like) has gone up dramatically, though, with more than 30 million users world-wide, it would still be less than 10% of the population of the United States. And that’s a lot.
Ever onward.
* Regarding the high cost of compute for training, here we go:
Regarding parameter counts growth, the industry is already reaching the limits for current hardware with dense models—a 1 trillion parameter model costs ~$300 million to train. With 100,000 A100s across 12,500 HGX / DGX systems, this would take about ~3 months to train. This is certainly within the realm of feasibility with current hardware for the largest tech companies. The cluster hardware costs would be a few billion dollars, which fits within the datacenter Capex budgets of titans like Meta, Microsoft, Amazon, Oracle, Google, Baidu, Tencent, and Alibaba.
Another order of magnitude scaling would take us to 10 trillion parameters. The training costs using hourly rates would scale to ~$30 billion. Even with 1 million A100s across 125,000 HGX / DGX systems, training this model would take over two years. Accelerator systems and networking alone would exceed the power generated by a nuclear reactor. If the goal were to train this model in ~3 months, the total server Capex required for this system would be hundreds of billions of dollars with current hardware.
This is not practical, and it is also likely that models cannot scale to this scale, given current error rates and quantization estimates.
The practical limit for a Chinchilla optimally trained dense transformer with current hardware is between ~1 trillion and ~10 trillion parameters for compute costs. With future reports, we will discuss this band more for both dense vs. sparse models and the cost competitiveness of Google’s TPUv4, TPUv5, Nvidia A100, H100, and AMD MI300. Data is another problem that we can cover in the future.
No comments:
Post a Comment