Pages in this blog

Friday, October 4, 2024

How might LLMs store facts? [Multilayer Perceptrons, MLP]

Time stamps:

0:00 - Where facts in LLMs live
2:15 - Quick refresher on transformers
4:39 - Assumptions for our toy example
6:07 - Inside a multilayer perceptron
15:38 - Counting parameters
17:04 - Superposition
21:37 - Up next

Preceding videos in this series:

Thursday, October 3, 2024

On the dockworkers strike, labor on the rise

Sohrab Ahmari, In Praise of the Dockworkers Shutting Down Our Ports, The Free Press, October 2, 2024.

The International Longshoremen’s Association, whose strike is crippling U.S. ports from the Gulf Coast to New England, may not seem like the wretched of the Earth. They’re asking for a 77 percent pay increase on top of the $39 per hour those on the top tiers already make. The union’s president, Harold Daggett, earns $728,000 a year and once owned a 76-foot boat. With major disruptions looming, no wonder even some of those Americans ordinarily sympathetic to organized labor might be thinking, Okay, this is going too far. The less sympathetic are already calling for the Marines to suppress the strike.

But here’s the hard truth: The militancy showcased by the ILA is exactly what is needed to restore a fairer, more balanced economy—the kind that created the middle class in the postwar decades and allowed your grandparents to access reliable healthcare, take vacations, and enjoy disposable incomes. Those who complain that today’s left has come to privilege boutique identity politics over bread-and-butter concerns should cheer the longshoremen. There is nothing “woke” about their exercise of economic power to win material gains for themselves and their industrial brethren.

The longshoremen are striking for familiar reasons: better wages and benefits, and to prevent automation from decimating their livelihoods. [...]

Some critics argue that the ILA’s demand that no automation take place at the ports is unreasonably rigid. It’s certainly audacious, but it’s called an opening gambit for a reason. I suspect we will see concessions on both sides leading to a reasonable settlement, as in the case of SAG. The rest—gripes about how much the ILA president earns or how longshoremen are already well-compensated—is the tired propaganda of the C-suite class. [...]

The ILA strike is a rare reminder of working people’s power to shut it all down. [...] Real progress in market societies results from precisely this dynamic tension between labor and capital. For too long, however, one side of the equation—labor—has been torpid, not to say dormant. The asset-rich had it so good over the past few decades—capturing the lion’s share of the upside from de-unionization, financialization, and offshoring, as wages stagnated for the bottom half—that they all but forgot what labor militancy can look and sound like. How much it can sting.

Now, the labor movement is on the move. Since the pandemic, workers across a wide range of industries have joined arms to form new unions or to secure better wages and working conditions under existing collective-bargaining agreements. Last year, some 539,000 workers were involved in 470 strikes and walkouts, according to Cornell researchers, up from 140,000 workers mounting 279 strikes in 2021. This ferment—what one labor scholar has called a “strike wave”—comes after the union share of the private-economy workforce has declined from its peak of one-third in 1945 to 6 percent today.

There’s more at the link.

Problems with so-called AI scaling laws

Arvind Narayanan and Sayash Kapoor, AI Scaling Myths, AI Snake Oil, June 27, 2024. The introduction:

So far, bigger and bigger language models have proven more and more capable. But does the past predict the future?

One popular view is that we should expect the trends that have held so far to continue for many more orders of magnitude, and that it will potentially get us to artificial general intelligence, or AGI.

This view rests on a series of myths and misconceptions. The seeming predictability of scaling is a misunderstanding of what research has shown. Besides, there are signs that LLM developers are already at the limit of high-quality training data. And the industry is seeing strong downward pressure on model size. While we can't predict exactly how far AI will advance through scaling, we think there’s virtually no chance that scaling alone will lead to AGI.

Under the heading, "Scaling “laws” are often misunderstood", they note:

Scaling laws only quantify the decrease in perplexity, that is, improvement in how well models can predict the next word in a sequence. Of course, perplexity is more or less irrelevant to end users — what matters is “emergent abilities”, that is, models’ tendency to acquire new capabilities as size increases.

Emergence is not governed by any law-like behavior. It is true that so far, increases in scale have brought new capabilities. But there is no empirical regularity that gives us confidence that this will continue indefinitely.

Why might emergence not continue indefinitely? This gets at one of the core debates about LLM capabilities — are they capable of extrapolation or do they only learn tasks represented in the training data? The evidence is incomplete and there is a wide range of reasonable ways to interpret it. But we lean toward the skeptical view.

There is much more under the following headings:

• Trend extrapolation is baseless speculation
• Synthetic data is not magic
• Models have been getting smaller but are being trained for longer
• The ladder of generality

These remarks are from the section on models getting smaller:

In other words, there are many applications that are possible to build with current LLM capabilities but aren’t being built or adopted due to cost, among other reasons. This is especially true for “agentic” workflows which might invoke LLMs tens or hundreds of times to complete a task, such as code generation.

In the past year, much of the development effort has gone into producing smaller models at a given capability level. Frontier model developers no longer reveal model sizes, so we can’t be sure of this, but we can make educated guesses by using API pricing as a rough proxy for size. GPT-4o costs only 25% as much as GPT-4 does, while being similar or better in capabilities. We see the same pattern with Anthropic and Google. Claude 3 Opus is the most expensive (and presumably biggest) model in the Claude family, but the more recent Claude 3.5 Sonnet is both 5x cheaper and more capable. Similarly, Gemini 1.5 Pro is both cheaper and more capable than Gemini 1.0 Ultra. So with all three developers, the biggest model isn’t the most capable!

Training compute, on the other hand, will probably continue to scale for the time being. Paradoxically, smaller models require more training to reach the same level of performance. So the downward pressure on model size is putting upward pressure on training compute.

Check out the newsletter, AI Snake Oil, and the book of the same title.

OpenAI’s $6.6B raise: What were they thinking?

Cory Weinberg, The Briefing: The Cynic’s Guide to OpenAI’s Megaround, The Information, Oct. 2, 2024:

The biggest question is: Will OpenAI ever be a good business? It’s debatable right now. At least on a sales multiple basis (13 to 14 times next year’s forecasted $11.6 billion revenue), some investors can justify it without embarrassment.

But investors in the latest round probably need OpenAI to eventually become a roughly $1 trillion company to get a strong return. That means at some point the startup will have to become a cash flow machine rather than a cash incinerator.

Of the seven companies with over $1 trillion in market cap currently, the median free cash flow from the past year was $57 billion. In that regard, OpenAI, which is chasing growth and spending heavily on computing capacity, has quite a way to go. (For what it’s worth, Fidelity investing in the latest round should mean we get a regular check-in on how OpenAI’s valuation is shifting, at least in the opinion of Fidelity, which needs to make its startup valuations public.)

To be sure, even if OpenAI’s latest benefactors don’t believe it can get to $1 trillion, many of them have all sorts of ulterior, strategic reasons to back the startup.