Sunday, July 21, 2024

A touch of yellow

Nine-year old chess prodigy hopes to become the youngest grandmaster ever

Isabella Kwai, At 5, She Picked Up Chess as a Pandemic Hobby. At 9, She’s a Prodigy. NYTimes, July 21, 2024.

Since learning chess during a pandemic lockdown, Bodhana Sivanandan has won a European title in the game, qualified for this year’s prestigious Chess Olympiad tournament, and established herself as one of England’s best players.

She also turned 9 in March. That makes Bodhana, a prodigy from the London borough of Harrow, the youngest player to represent England at such an elite level in chess, and quite possibly the youngest in any international sporting competition.

“I was happy and I was ready to play,” Bodhana said in a phone interview, two days after she learned that she had been selected for this year’s Olympiad, an international competition considered to be the game’s version of the Olympics.

The fourth-grader, who learned chess four years ago when she stumbled across a board her father was planning to discard, knows exactly what she wants to accomplish next. “I’m trying to become the youngest grandmaster in the world,” she said, “and also one of the greatest players of all time.”

Chess is one of the arenas in which prodigies emerge. Music and math are others. Why? I assume it has something to do with their brains. What?

I note also that playing chess has been central to the development of AI and that it is the first arena in which computers and equaled and then surpassed the best human performance. What can we make of that? I don’t know, but surely there’s something to be discovered here.

* * * * *

Note the final paragraph of this post, On the significance of human language to the problem of intelligence (& superintelligence).

The question of machine superintelligence would then become:

Will there ever come a time when we have problem-solving networks where there exists at least one node that is assigned to a non-routine task, a creative task, if you will, that only a computer can perform?

That’s an interesting question. I specify non-routine task because we have all kinds of computing systems that are more effective at various tasks than humans are, from simple arithmetic calculations to such things solving the structure of a protein string. I fully expect the more and more systems will evolve that are capable of solving such sophisticated, but ultimately routine, problems. But it’s not at all obvious to me that computational systems will eventually usurp all problem-solving tasks.

Human Go players learn from superhuman AIs

There are more links in the thread.

* * * * *

So: "Last year, we found superhuman Go AIs are vulnerable to “cyclic attacks”. This adversarial strategy was discovered by AI but replicable by humans."

Superhuman Go AIs discover a new region of the Go search-space. That's one thing. The fact that, once discovered, humans are able to exploit this region against a superhuman Go AI. That is just as interesting. 

One question we can ask about superintelligence is whether or not so-called superintelligent AIs can do things that are inherently and forever beyond human capacity. In this particular case, we have humans learning things initially discovered by AIs.

Saturday, July 20, 2024

Coffee and cream

Masha Gessen: Are we on the edge of an autocratic breakthrough?

Masha Gessen, Biden and Trump Have Succeeded in Breaking Reality, NYTimes, July 20, 2024.

The last three paragraphs:

As for Trump, despite the gestures he made in his speech on Thursday night toward national reconciliation, tolerance and unity, the convention reflected the ultimate consolidation of his power. If he is elected, a second Trump administration seems likely to bring what the Hungarian sociologist Balint Magyar has termed an “autocratic breakthrough” — structural political change that is impossible to reverse by electoral means. But if we are in an environment in which nothing is believable, in which imagined secrets inspire more trust than the public statements of any authority, then we are already living in an autocratic reality, described by another of Arendt’s famous phrases: “Nothing is true and everything is possible.”

It’s tempting to say that Trump’s autocratic movement has spread like an infection. The truth is, the seeds of this disaster have been sprouting in American politics for decades: the dumbing down of conversation, the ever-growing role of money in political campaigns, the disappearance of local news media and local civic engagement and the consequent transformation of national politics into a set of abstracted images and stories, the inescapable understanding of presidential races as personality contests.

None of this made the Trump presidency inevitable, but it made it possible — and then the Trump presidency pushed us over the edge into the uncanny valley of politics. If Trump loses this year — if we are lucky, that is — it will not end this period; it will merely bring an opportunity to undertake the hard work of recovery.

Nahre Sol talks with Tigan Hamasyan

From the Wikipedia entry for Hamasyan:

Tigran Hamasyan (Armenian: Տիգրան Համասյան; born July 17, 1987) is an Armenian jazz pianist and composer. He plays mostly original compositions, strongly influenced by the Armenian folk tradition, often using its scales and modalities. In addition to this folk influence, Hamasyan is influenced by American jazz traditions and, to some extent, as on his album Red Hail, by progressive rock. His solo album A Fable is most strongly influenced by Armenian folk music. Even in his most overt jazz compositions and renditions of well-known jazz pieces, his improvisations often contain embellishments based on scales from Middle Eastern/Southwest Asian traditions.

Friday, July 19, 2024

Friday Fotos: Ever more flowers

Refactoring training data to make smaller LLMs

Tentatively, most interesting thing I've seen coming out of the AI world in a year or so.

A tweet by Andrej Karpathy:

LLM model size competition is intensifying… backwards!

My bet is that we'll see models that "think" very well and reliably that are very very small. There is most likely a setting even of GPT-2 parameters for which most people will consider GPT-2 "smart". The reason current models are so large is because we're still being very wasteful during training - we're asking them to memorize the internet and, remarkably, they do and can e.g. recite SHA hashes of common numbers, or recall really esoteric facts. (Actually LLMs are really good at memorization, qualitatively a lot better than humans, sometimes needing just a single update to remember a lot of detail for a long time). But imagine if you were going to be tested, closed book, on reciting arbitrary passages of the internet given the first few words. This is the standard (pre)training objective for models today. The reason doing better is hard is because demonstrations of thinking are "entangled" with knowledge, in the training data.

Therefore, the models have to first get larger before they can get smaller, because we need their (automated) help to refactor and mold the training data into ideal, synthetic formats.

It's a staircase of improvement - of one model helping to generate the training data for next, until we're left with "perfect training set". When you train GPT-2 on it, it will be a really strong / smart model by today's standards. Maybe the MMLU will be a bit lower because it won't remember all of its chemistry perfectly. Maybe it needs to look something up once in a while to make sure.

That tweet is linked to this:

Wednesday, July 17, 2024

AI made up half of VC investment last quarter

A view from the window

What’s it mean to understand how LLMs work?

I don’t think we know. What bothers me is that people in machine learning seem to think of word means as Platonic ideals. No, that’s not what they’d say, but some such belief seems implicit in what they’re doing. Let me explain.

I’ve been looking through two Anthropic papers on interpretability: Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, and Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. They’re quite interesting. In some respects they involve technical issues that are a bit beyond me. But, setting that aside, they also involve masses to detail that you just have to slog through in order to get a sense of what’s going on.

As you may know, the work centers on things that they call features, a common term in this business. I gather that:

  • features are not to be identified with individual neurons or even well-defined groups of neurons, which is fine with me,
  • nor are features to be closely identified with particular tokens. A wide range of tokens can be associated with any given feature.

There is a proposal that these features are some kind of computational intermediate.

We’ve got neurons, features, and tokens. I believe that the number of token types is on the order of 50K or so. The number of neurons is considerably larger varies depending on the size of the model, but will be 3 or 4 orders of magnitude larger. The weights on those neurons characterize all possible texts that can be constructed with those tokens. Features are some kind of intermediate between neurons and texts.

The question that keeps posing itself to me is this: What are we looking for here? What would an account of model mechanics, if you will, look like?

A month or so ago Lex Fridman posted a discussion with Ted Gibson, an MIT psycholinguist, which I’ve excerpted here at New Savanna. Here’s an excerpt:

LEX FRIDMAN: (01:30:35) Well, let’s take a stroll there. You wrote that the best current theories of human language are arguably large language models, so this has to do with form.

EDWARD GIBSON: (01:30:43) It’s a kind of a big theory, but the reason it’s arguably the best is that it does the best at predicting what’s English, for instance. It’s incredibly good, better than any other theory, but there’s not enough detail.

LEX FRIDMAN: (01:31:01) Well, it’s opaque. You don’t know what’s going on.

EDWARD GIBSON: (01:31:03) You don’t know what’s going on.

LEX FRIDMAN: (01:31:05) Black box.

EDWARD GIBSON: (01:31:06) It’s in a black box. But I think it is a theory.

LEX FRIDMAN: (01:31:08) What’s your definition of a theory? Because it’s a gigantic black box with a very large number of parameters controlling it. To me, theory usually requires a simplicity, right?

EDWARD GIBSON: (01:31:20) Well, I don’t know, maybe I’m just being loose there. I think it’s not a great theory, but it’s a theory. It’s a good theory in one sense in that it covers all the data. Anything you want to say in English, it does. And so that’s how it’s arguably the best, is that no other theory is as good as a large language model in predicting exactly what’s good and what’s bad in English. Now, you’re saying is it a good theory? Well, probably not because I want a smaller theory than that. It’s too big, I agree.

It's that smaller theory that interests me. Do we even know what such a theory would look like?

Classically, linguists have been looking for grammars, a finite set of rules that characterizes all the sentences in a language. When I was working with David Hays back in the 1970s, we were looking for a model of natural language semantics. We chose to express that model as a directed graph. Others were doing that as well. Perhaps the central question we faced was this: what collection of node types and what collection of arc types did we need to express all of natural language semantics? Even more crudely, what collection of basic building blocks did we need in order to construct all possible texts?

These machine language people seem to be operating under the assumption that they can figure it out by an empirical bottom-up procedure. That strikes me as being a bit like trying to understand the principles governing the construction of temples by examining the materials from which they’re constructed, the properties of rocks and mortar, etc. You can’t get there from here. Now, I’ve some ideas about how natural language semantics works, which puts me a step ahead of them. But I’m not sure how far that gets us.

What if the operating principles of these models can’t be stated in any existing conceptual framework? The implicit assumption behind all this work is that, if we keep at it with the proper tools, sooner or later the model is going to turn out to an example of something we already understand. To be sure, it may be an extreme, obscure, and extraordinarily complicated example, but in the end, it’s something we already understand.

Imagine that some UFO crashes in a field somewhere and we are able to recover it, more or less intact. Let us imagine, for the sake of argument, that the pilots have disappeared, so all we’ve got is the machine. Would we be able to figure out how it works? Imagine that somehow a modern digital computer were transported back in time and ended up in the laboratory of, say, Nikola Tesla. Would he have been able to figure out what it is and how it works?

Let’s run another variation on the problem. Imagine that some superintelligent, but benevolent aliens were to land, examine our LLMs, and present us with documents explaining how they work. We would be able to read and understand those documents. Remember, these are benevolent aliens, so they’re doing their best to help us. I can imagine three possibilities:

  1. Yes, perhaps with a bit of study, we can understand the documents.
  2. We can’t understand them right away, but the aliens establish a learning program that teaches us what we know to understand those documents.
  3. The documents are forever beyond us.

I don’t believe three. Why not? Because I don’t believe our brains limit us to current modes of thought. In the past we’ve invented new ways of thinking; no reason why would could continue doing so, or learn new methods under the tutelage of benevolent aliens.

That leaves us with 1 and 2. Which is it? At the moment I’m leaning toward 2. But of course those superintelligent aliens don’t exist. We’re going to have to figure it out for ourselves.

Sunday, July 14, 2024

Mood-congruent memory revisited

Faul, L., & LaBar, K. S. (2023). Mood-congruent memory revisited. Psychological Review, 130(6), 1421–1456. https://doi.org/10.1037/rev0000394 (ungated version: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10076454/)

Abstract: Affective experiences are commonly represented by either transient emotional reactions to discrete events or longer term, sustained mood states that are characterized by a more diffuse and global nature. While both have considerable influence in shaping memory, their interaction can produce mood-congruent memory (MCM), a psychological phenomenon where emotional memory is biased toward content affectively congruent with a past or current mood. The study of MCM has direct implications for understanding how memory biases form in daily life, as well as debilitating negative memory schemas that contribute to mood disorders such as depression. To elucidate the factors that influence the presence and strength of MCM, here we systematically review the literature for studies that assessed MCM by inducing mood in healthy participants. We observe that MCM is often reported as enhanced accuracy for previously encoded mood-congruent content or preferential recall for mood-congruent autobiographical events, but may also manifest as false memory for mood-congruent lures. We discuss the relevant conditions that shape these effects, as well as instances of mood-incongruent recall that facilitate mood repair. Further, we provide guiding methodological and theoretical considerations, emphasizing the limited neuroimaging research in this area and the need for a renewed focus on memory consolidation. Accordingly, we propose a theoretical framework for studying the neural basis of MCM based on the neurobiological underpinnings of mood and emotion. In doing so, we review evidence for associative network models of spreading activation, while also considering alternative models informed by the cognitive neuroscience literature of emotional memory bias. (PsycInfo Database Record (c) 2024 APA, all rights reserved)

Clouds over Jersey City, Hoboken, and the Hudson River

Economic growth in Roman Britain over four centuries

Scott G. Ortman, José Lobo, Lisa Lodwick, Rob Wiseman, Olivia Bulik, Victoria Harbison , and Luís M. A. Bettencourt, Identification and measurement of intensive economic growth in a Roman imperial province, Science Advances, 5 Jul 2024, Vol 10, Issue 27, DOI: 10.1126/sciadv.adk5517

Abstract: A key question in economic history is the degree to which preindustrial economies could generate sustained increases in per capita productivity. Previous studies suggest that, in many preindustrial contexts, growth was primarily a consequence of agglomeration. Here, we examine evidence for three different socioeconomic rates that are available from the archaeological record for Roman Britain. We find that all three measures show increasing returns to scale with settlement population, with a common elasticity that is consistent with the expectation from settlement scaling theory. We also identify a pattern of increase in baseline rates, similar to that observed in contemporary societies, suggesting that this economy did generate modest levels of per capita productivity growth over a four-century period. Last, we suggest that the observed growth is attributable to changes in transportation costs and to institutions and technologies related to socioeconomic interchange. These findings reinforce the view that differences between ancient and contemporary economies are more a matter of degree than kind.

H/t Tyler Cowen.