Saturday, April 20, 2024

The irises are blooming early this year

Melissa Cody: Navajo weaver beyond infinity and back

Patricia Leigh Brown, A Millennial Weaver Carries a Centuries-Old Craft Forward, NYTimes, April 18, 2024.

Spiders are weavers. The Navajo artist and weaver Melissa Cody knows this palpably. As she sits cross-legged on sheepskins at her loom, on one of the wooden platforms that boost her higher as her stack of monumental tapestries grows, the sacred knowledge of Spider Woman and Spider Man, who brought the gift of looms and weaving to the Diné, or Navajo, is right there in her studio with her.

It also infuses “Melissa Cody: Webbed Skies,” the first major solo exhibition of the artist’s work, which is on view at MoMA PS1 through Sept. 9. in a co-production with the São Paulo Museum of Art in Brazil (known as MASP).

Re-weaving, re-mixing, re-creating:

“Hundreds of years ago, Navajo weaving played with illusion, creating 3-D effects with the overlapping and overlay of motifs,” said Ann Lane Hedlund, a cultural anthropologist and retired curator who works with artists. “Melissa has taken that to a new realm.”

She has mastered a slow art in a fast world.

Cody’s vibrant Germantown Revival color palette emerged from a dark era: the devastating 1863-1866 U.S. government campaign to annihilate the Diné by burning villages, killing herds and removing more than 10,000 Navajo from their homelands. In a forced march, the Navajo walked for hundreds of miles to Bosque Redondo at Fort Sumner, in present-day New Mexico, where they were incarcerated. There, in a creative act of resistance, women unraveled government-issued synthetically dyed wool blankets made in Germantown, Pa., and rewove them in their own designs, surmounting trauma and loss through sheer perseverance and beauty.

Precision and reflection:

She credits her mother, whose loom was in the living room, with “instilling independence in what I created.”

“She taught me a heightened, technically precise level of work, without a lot of negative space and every inch filled with geometric patterning,” she explained. “When I asked her about colors and if she liked them, she’d say, ‘Do you like them? What do you think about it?’ So there was a lot of self-reflection.”

Cody’s years perfecting traditional techniques gave her the confidence to experiment and create more personal work. “It’s ‘What emotion am I trying to convey?’” she said. “What’s the thesis behind it?”

Some of her most ambitious pieces have been responses to personal crises.

Local order and global vision:

To a non-weaver, one of the most extraordinary aspects of Navajo weaving is its largely spontaneous quality, accomplished with nary a sketch. “We’re graphing it out in a mental image — maybe a texture out in nature or the feel of a city, or a color, and then replicating it in woven form,” Cody said. “It’s a slow-moving fluidity, with everything calculated down to each individual string.” A large-scale weaving takes six months or more to complete.

Her mother visits frequently to help out, following her daughter’s lead as they lay the warp strings out on the floor. The studio is definitely a family affair, the loom built by her brother Kevin and the platforms by her partner, Giovanni McDonald Sanchez.

There's more at the link, including wonderful photos.

Friday, April 19, 2024

Chat GPT chooses 42

I gave it a try myself:

I wonder how the other LLM-based chatbots would respond to the same prompt? Would the Llamas, Geminis and the Claudes agree with one another in their replies, whichever one they pick? Is "42" the number favored by 7 out of 10 chatbots?

Friday Fotos: Hoboken and the Hudson

Will Team LLM ever catch up to Team Atlas? [+ escape from a maze as a facilitating analogy]

I don't know when I first saw a video of Atlas. But whenever it was, I'm sure I was astounded. As astounded as I was with ChatGPT? I don't remember. And of course, I couldn't play around with Atlas. In any case, that would have been much more difficult than playing around with ChatGPT.

I don't know when I first saw a video of Atlas. But whenever it was, I'm sure I was astounded. As astounded as I was with ChatGPT? I don't remember. And of course, I couldn't play around with Atlas. In any case, that would have been much more difficult than playing around with ChatGPT.

The thing is, the researchers who built Atlas had to develop a profound understanding of the dynamics of humanoid motion. In contrast, the reseachers who work on LLMs don't need to know much of anything about language. That's worth thinking about.

Consider, for example, these remarks that a pioneering computational linguist, Martin Kay, made awhile back:

Symbolic language processing is highly nondeterministic and often delivers large numbers of alternative results because it has no means of resolving the ambiguities that characterize ordinary language. This is for the clear and obvious reason that the resolution of ambiguities is not a linguistic matter. After a responsible job has been done of linguistic analysis, what remain are questions about the world. They are questions of what would be a reasonable thing to say under the given circumstances, what it would be reasonable to believe, suspect, fear, or desire in the given situation. [...] What we are doing is to allow statistics over words that occur very close to one another in a string to stand in for the world construed widely, so as to include myths, and beliefs, and cultures, and truths and lies and so forth. As a stop-gap for the time being, this may be as good as we can do, but we should clearly have only the most limited expectations of it because, for the purpose it is intended to serve, it is clearly pathetically inadequate. The statistics are standing in for a vast number of things for which we have no computer model. They are therefore what I call an “ignorance model.”

LLMs did not exist in 2005, when Kay made those remarks. As he died in 2021, before the release of ChatGPT, I don't know how it would have reacted to it. I see little reason to believe that he would alter those remarks in a fundamental way. Perhaps he would remove the word “clearly,” and maybe “pathetically” as well. 

LLMs, however, are still inadequate models of human linguistic behavior. The industry’s current infatuation with them is perhaps an ironic testament to the cliché that ignorance is bliss. Martin Kay also remarked that, in resting content with a statistical view of language, “one turns one’s back on the scientific achievements of the ages and foreswears the opportunity that computers offer to carry that enterprise forward.” I agree, though perhaps not in the way Kay meant those words. For I believe that LLMs have an important role in developing a detailed understanding how language works. The intellectual monoculture that has grown up around LLMs seems unable or unwilling to appreciate that – a profound failure of the imagination.

Language is grounded in the operations of the human brain. Our ability to probe and maniuplate the brain is quite limited. That is not the case with LLMs. Here’s a facilitating analogy I am working on for a report I am preparing about my work with ChatGPT over the last year. I’m talking about using ChatGPT to generate stories:

The model is structured such that, when it starts generating a text from a certain location in its activation space, it will have created a coherent text – a story in this case, word-by-word, by the time it exits that region of the space.

As a crude analogy, consider what is called a simply connected maze, one without any loops. If you are lost somewhere in such a maze, no matter how large and convoluted it may be, there is a simple procedure you can follow that will take you out of the maze. You don’t need to have a map of the maze; that is, you don’t need to know its structure. Simiply place either your left or your right hand in contact with a wall and then start walking. As long as you maintain contact with the wall, you will find an exit. The structure of the maze is such that that local rule will take you out.

“Produce the next word” is certainly a local rule. The structure of LLMs is such that, given the appropriate context – a prompt asking for a story, following that rule will produce a coherent a story. Given a different context, that is to say, a different prompt, that simple rule will produce a different kind of text.

Now, let’s push the analogy to the breaking point: We may not know the structure of LLMs, but we do know a lot about the structure of texts, from phrases and sentences to extended texts of various kinds. In particular, the structure of stories has been investigated by students of several disciplines, including folklore, anthropology, literary criticism, linguistics, and symbolic artificial intelligence. Think of the structures proposed by those disciplines as something like a map of the maze in our analogy.

Unfortunately, students of those various disciplines have not reached a consensus on how to characterize those structures. Linguists are entertaining a variety of proposals about the nature of sentence-level syntax and students of those other disciplines haven’t converged on a way to describe story structure. Still, we have a starting point for constructing our story maps, even if it is somewhat confused and ambiguous.

If we are to exploit LLMs in ways that analogy suggests, then we are going to have to use symbolic models to do it. First we propose a symbolic model for some aspect of the structure we know how to probe or manipulate. Then see whether an appropriately prepared LLM behaves in the way our model predicts that it should. If it doesn’t, then we revise and repeat.



and again,

and again.... 

* * * * *

Addendum:  The NYTimes has published an article about Atlas, noting that it will retire to the a museum of decomissioned robots in the Boston Dynamics lobby.

Thursday, April 18, 2024

More pansies

Another Crazy Interview: Mark Zuckerberg

YouTube copy:

8,847 views Apr 18, 2024 Dwarkesh Podcast
Zuck on:

- Llama 3
- open sourcing towards AGI
- custom silicon, synthetic data, & energy constraints on scaling
- Caesar Augustus, intelligence explosion, bioweapons, $10b models, & much more



00:00:00 Llama 3
00:09:15 Coding on path to AGI
00:26:07 Energy bottlenecks
00:34:03 Is AI the most important technology ever?
00:38:04 Dangers of open source
00:54:40 Caesar Augustus and metaverse
01:05:36 Open sourcing the $10b model & custom silicon
01:16:02 Zuck as CEO of Google+

I don’t know what to make of this. Zuckerberg’s a smart guy. As founder and CEO of Meta (formerly Facebook) he’s also rich and powerful. I take it as self-evident that there’s some kind of connection between being a smart guy and whatever/however he became rich and powerful. I also take it as self-evident that the path that led to being rich and powerful was touched by more than a little luck.

When he talks about energy bottlenecks on the way to more and more compute for whatever, I figure he more or less knows what he’s talking about. That’s a thinkable problem and he’s got smart staff who can dig into all the details and advise him.

And when he talks about AI being the most important technology ever, now things get tricky. He obviously thinks it is. Lots of people think that, or something close to it. I’m one of them. But beyond that, just why that’s the case and what it means for the future, who knows? But some people have thought about that thing more deeply than others, much more deeply.

How deeply has Zuckerberg thought about it? How deeply could he have possibly thought about it? He dropped out of Harvard in his sophomore year to run his company. He’s been running it ever since. That doesn’t give him much to read deeply in a wide range of subjects, philosophy, cultural evolution, cognitive science, the history of science and technology, anthropology and so forth and so on. I believe at one time he set out to visit all 50 states in America, so he spent a lot of time traveling. He probably had some time to read. Did he read up on everything that’s relevant to thinking about the history of humankind? But how much could he have possibly read?

Nor is it a matter of just reading. You have to think about it. And to really think you need to write and discuss. How much of that has he done on those kinds of subjects?

And yet now he’s having a conversation with Dwarkesh Patel on really Big Picture Issues. It sounds to me like he’s mostly just making stuff up. If he were an A.I. we’d say he’s hallucinating, confabulating. But what else can he do?

Note that I say this, not in a spirit of criticizing Zuckerberg, or, for that matter, of Patel. I’m writing in in a spirit of observation. THAT’s what they’re doing.

Do they HAVE to do it? Well, Dwarkesh has more leeway than Zuckerberg. Dwarkesh is just a podcaster. He’s got to get clicks, and he’s in a position to attract interviews that bring him clicks. Nothing much depends on his interviews in any direct way. 

But a great deal depends on the decisions Zuckerberg makes about Meta, over which he seems to have extraordinary control. And the nature of Meta’s business is such that those Big Picture Issues bear on how Meta utilizes its resources. He may not have had time to think about those issues very deeply, but he has no choice but to make decisions of that kind.

That’s crazy.

Twigs and crane with flag

Peter Thiel on the Western canon and on A.I.

Tyler Cowen interviews Peter Thiel on various topics. Here’s the introduction:

In this conversation recorded live in Miami, Tyler and Peter Thiel dive deep into the complexities of political theology, including why it’s a concept we still need today, why Peter’s against Calvinism (and rationalism), whether the Old Testament should lead us to be woke, why Carl Schmitt is enjoying a resurgence, whether we’re entering a new age of millenarian thought, the one existential risk Peter thinks we’re overlooking, why everyone just muddling through leads to disaster, the role of the katechon, the political vision in Shakespeare, how AI will affect the influence of wordcels, Straussian messages in the Bible, what worries Peter about Miami, and more.

On the Western canon

COWEN: Are there other holy books besides the Bible that you draw ideas and inspiration from? And what would those be?

THIEL: I think in some sense, it’s all the great books. They’re not quite at the scale of these holy books, but there was a way that we treated Shakespeare or Cervantes or Goethe as these almost semi-divine writers, and I think that’s the attitude one has to have to read any of these books appropriately and seriously.

COWEN: So, the Western canon would be your answer, so to speak?

THIEL: Something like the Western canon. I don’t think the great books are quite as holy as the Bible, and as a result, I probably don’t read enough of them, but yes, that’s the closest approximation. COWEN: And it includes science fiction — yes or no?

THIEL: I read a lot as a kid. I read so little of that nowadays. It’s all too depressing.

That’s certainly how Harold Bloom thought about the canonical authors, as being “almost semi-divine.” We need to rethink that. Though it goes beyond rethinking. We have to rework the mechanisms of culture. How do we use these books most effectively? That’s tricky, since we don’t know what those mechanisms are or how they work. There’s no engineering solution.

Silicon Valley’s not asking the right questions about A.I.

COWEN: For our last segment, let’s turn to artificial intelligence. As you know, large language models are already quite powerful. They’re only going to get better. In this world to come, will the wordcels just lose their influence? People who write, people who play around with ideas, pundits — are they just toast? What’s this going to look like? Are they going to give up power peacefully? Are they going to go down with the ship? Are they going to set off nuclear bombs?

THIEL: I’ll say the AI thing broadly, the LLMs — it’s a big breakthrough. It’s very important, and it’s striking to me how bad Silicon Valley is at talking about these sorts of things. The questions are either way too narrow, where it’s something like, is the next transformer model going to improve by 20 percent on the last one or something like this? Or they’re maybe too cosmic, where it’s like from there we go straight to the simulation theory of the universe. Surely there are a lot of in-between questions one could ask. Let me try to answer yours.

My intuition would be it’s going to be quite the opposite, where it seems much worse for the math people than the word people. What people have told me is that they think within three to five years, the AI models will be able to solve all the US Math Olympiad problems. That would shift things quite a bit.

There’s a longer history I always have on the math versus verbal riff. If you ask, “When did our society bias to testing people more for math ability?” I believe it was during the French Revolution because it was believed that verbal ability ran in families. Math ability was distributed in this idiot savant way throughout the population.

If we prioritized math ability, it had this meritocratic but also egalitarian effect on society. Then, I think, by the time you get to the Soviet Union, Soviet Communism in the 20th century, where you give a number theorist or chess grandmaster a medal — which was always a part I was somewhat sympathetic to in the Soviet Union — maybe it’s actually just a control mechanism, where the math people are singularly clueless. They don’t understand anything, but if we put them on a pedestal, and we tell everyone else you need to be like the math person, then it’s actually a way to control. Or the chess grandmaster doesn’t understand anything about the world. That’s a way to really control things.

If I fast-forwarded to, let’s say, Silicon Valley in the early 21st century, it’s way too biased toward the math people. I don’t know if it’s a French Revolution thing or a Russian-Straussian, secret-cabal, control thing where you have to prioritize it. That’s the thing that seems deeply unstable, and that’s what I would bet on getting reversed, where it’s like the place where math ability — it’s the thing that’s the test for everything.

It’s like if you want to go to medical school, okay, we weed people out through physics and calculus, and I’m not sure that’s really correlated with your dexterity as a neurosurgeon. I don’t really want someone operating on my brain to be doing prime number factorizations in their head while they’re operating on my brain, or something like that.

In the late ’80s, early ’90s, I had a chess bias because I was a pretty good chess player. And so my chess bias was, you should just test everyone on chess ability, and that should be the gating factor. Why even do math? Why not just chess? That got undermined by the computers in 1997. Isn’t that what’s going to happen to math? And isn’t that a long-overdue rebalancing of our society?

Thiel’s right about Silicon Valley’s questions about A.I. As Thiel says, their questions are too narrowly technical or too “cosmic.” But it’s not at all clear to me how we’re going to get the rebalancing Thiel mentions.

There’s much more in the interview.

Wednesday, April 17, 2024

Rich people live there, yet the sky is angry for them as well

The problem with the AGI concept

Pink sakura

The Seven Year Itch, Mad Men, Sex and the City [Media Notes 117]

I was, say, half a season in to re-watching Sex and the City, which originally aired from 1998 to 2004, when I watched The Seven Year Itch, a Billy Wilder romantic comedy from 1955. Remember that iconic shot of Marilyn Monroe standing on a subway grate wearing a body-hugging white dress with the skirt billowing up around her? That’s from this film. Anyhow, “what an interesting juxtaposition,” thought I to myself, “both about sex in New York City, but separated by forty years.” Perhaps I should write something about that.

As I was thinking about that I thought that it would be interesting to throw Mad Men into the mix. It appeared in between 2007 and 2015, but is set mostly in New York City in the 1960s. That gives us a series set in time not long after The Seven Year Itch, but produced from a sensibility much closer to Sex and the City.

That would be a very interesting and, alas, complicated, three-way, certainly much more than I can mount in a single blog post. But then, merely suggesting the comparison is, say, and third of the job. And some broad comparisons are quickly sketched out.

We have to set aside the fact that Itch is a 105-minute movie depicting events that take place over two days while the other titles are multiple part series sunning for several years each and depicting several years of elapsed fictional time. But all three involve sex, and romance, though Mad Men is more broadly about American life and (upper) middle-class culture and business while Sex and the City is pretty much focused on the sexual lives of four single professional women.

The obvious contrasts: First, sexual mores were more open and freer in turn-of-the millennium New York City than they were in the mid-20th century, though that was beginning to change in the world depicted in Mad Men. Secondly, financially independent single professional women like the quartet in Sex and the City, were rare in mid-century American. Those stories couldn’t have happened back then. One of the major story lines in Mad Men centered on how Peggy Olson moved from the secretarial pool in the first season into copywriting and eventually into management. That is to say, she moves into the kind of positions all of the Sex and the City women have, but without the sexual latitude. They pretty-much assumed they belonged in such jobs.

Third: Contemporary mores allow for a much franker depiction of sexuality than was possible in 1955. It’s worth nothing that there was a one-night-stand in the play that The Seven Year Itch was derived from, but that was eliminated in the movie version. In both versions Tom Ewell plays Richard Sherman, a publishing executive in New York City. His wife and young son have left to spend the summer in Maine, where it’s much cooler. An un-named young woman, played by Marilyn Monroe, sublets the apartment above Ewell. He fantasizes about having an adulterous affair with her; they even go to see a movie (that’s when Monroe stands over a subway grate). Nothing comes of those fantasies (which are accompanied by Rachmaninoff’s Second Piano Concerto, a very 1950s touch) in the movie. I assume that the difference between the play and the movie on this matter is the difference between a (somewhat sophisticated) New York audience and a national audience. By contrast, adultery was a central motif in the world of Mad Men, which was not very far removed from the world from the Seven Year world. But the 21st century story-telling conventions of Mad Men permitted, even required, a sexual frankness that had to be suppressed for national audiences in the mid-1950s. Adultery also shows up in the sexually freer world depicted in Sex and the City, though not so centrally as in Mad Men.

While there’s much more that could be said, I’ll content myself with two observations. The first is about how Marilyn Monroe was presented, much like Christina Hendrix was presented as Joan Holloway in Mad Men. Just as Monroe was the sex goddess of mid-century America, so Holloway was the sex goddess of Mad Men.

Second, and lastly, John Slattery, who played, Roger Sterling, one of the central characters in Mad Men, shows up in minor role in season three of Sex and the City. He’s one of the lovers of Carrie Bradshaw (played by Sarah Jessica Parker). Episode 2, “Politically Erect,” of season three begins with this line in voice-over:

I had been dating a politician, Bill Kelley, for three weeks now. Since most of my time with him was spent on the campaign trail, I decided to dress the part. I found some vintage Halston and did a spin on Jackie Kennedy. The early years. [...] I figured we made a good match. I was adept at fashion. He was adept at politics.

Perfect. Fashion certainly existed in the 1950s, but I can’t imagine playing it front-and-center back then in the way that it is in Sex and the City.

Tuesday, April 16, 2024

To the Moon! [She's becoming a young woman]

The song was written in 1954 as "In Other Words." It was recorded by various artists and in 1963 Peggy Lee convinced the composer, Bart Howard, to change the name to "Fly Me to the Moon." An instrumental version by Joe Harnell became a hit in 1963 and won a Grammy, but it's the 1964 recording by Frank Sinatra that I'm most familiar with. Sinatra was backed by Count Basie, with an arrangement by Quincy Jones. That version subsequently became associated with the Apollo Space program.

The only thing I can tell you about this version is that the trumpeter is young Kwak DaKyung, whom I've been following for a number of years. The male vocalist owes a debt to Sinatra, though I don't think Sinatra ever performed in shades. The female vocalist is an excellent scat singer.

From the Wikipedia article:

By 1995, the song had been recorded more than 300 times. According to a poll conducted by Japanese music magazine CD&DL Data in 2016 about the most representative songs associated with the Moon, the cover versions by Claire Littley and Yoko Takahashi ranked 7th by 6,203 respondents. The Claire cover version won the Planning Award of Heisei Anisong Grand Prize among the anime theme songs from 1989 to 1999, following its appearance in the end credits of Neon Genesis Evangelion.

From Sinatra in 1964, to the moon a couple years later, to Japanese anime at the end of the century.

Now, sit back for a moment and ponder the fact that that very American song from the mid-20th Century is being performed in 2024 by a Korean jazz combo featuring a 14 year old trumpet player. I'm waiting for her to cut loose. I wonder when?

Spots of color on the Hudson

AI, Chess, and Language 3.3: Chess and language as models, a philosophical interlude

I now want to take a look at AI, chess, and language from a Piagetian point of view. While he is best known for his work in developmental psychology, Piaget was also interested in the development of concepts over historical time, which he called genetic epistemology, and more generally, in the construction of mental mechanisms. He was particularly interested in abstraction and in something he called reflective abstraction. The concept is a slippery one. Ernst von Glasersfeld has a useful account (Abstraction, Re-Presentation, and Reflection: An Interpretation of Experience and of Piaget’s Approach) from which I abstract, if you will, perhaps an over-simplified idea, which is that major steps in cognitive evolution involve taking a mechanism that is operative at one level and making it an object which is operated on by mechanisms at a new and higher level.

Let us take chess as an example. We know that all possible chess games can be arranged as a tree – something we examined earlier, Search! What enables us to entertain the idea that chess is a paradigmatic case of cultural evolution? But we have only known that since the mathematician Ernest Zermlo published “Über eine Anwendung der Mengenlehre auf die Theorie des Schachspiels” (“On an application of set theory to the theory of chess”) in 1913. Ever since the game emerged players have been exploring that tree, but without explicitly knowing it. It was only when Zermelo had published the idea that the chess tree became an object that could be explicitly examined had explored as such.

I don’t know when that idea crossed into the chess world. In a quick search I found out that Alexander Kotov used it a book, Think Like a Grand Master, which was translated into English in 1971. Kotive wrote of building an “analysis tree.” I assume that chess players became aware of the idea sooner than that, perhaps not long after Zermelo’s paper was published. In any event, for my present purposes, the date is irrelevant. What is important is simply that it happened. The tree structure has been central to all work in computer chess.

The tree structure is central to the activity of search. But there is more to chess than searching for possible moves. The moves must be evaluated and a strategy has to be executed. Various means have been developed to do those things with the result that computers can now play chess better than any human. Chess is a “solved” problem. And the components of various solutions are objects for explicit examination and design.


Unlike earlier chess problems, which have been completed based on symbolic technology, some of the most recent chess programs, such as AlphaZero, use neural nets for the evaluation function. What those nets are doing is opaque. We know how to build them, but we don’t know what they do.

And that brings us to language.

Language has been investigated for centuries. More specifically, it has been subject to formal analysis for the last three quarters of a century, but cognitive scientists have come to little agreement about syntax much less semantics. Nonetheless large language models are now capable of very impressive language performance. Like all neural network models, however, these models are opaque. But what if we could figure out how they worked internally?

Consider following diagram, which I have commented about in my paper on GPT-3: GPT-3: Waterloo or Rubicon? Here be Dragons:

Texts are one product of the interaction of the human mind and the world. LLMs are trained on large bodies of these texts. It follows that the internal structure of these models must somehow, we don’t know how, reflect the nature of that interaction. If we could understand the internal structure of these models, wouldn’t that be a reflective abstraction over the processes of the human mind in the same way that the chess tree is a reflective abstraction over the human mind as it is engaged in the game of chess?

Yes, the chess tree is not all of chess, but only a part. And we know how to augment that part with evaluation functions. Figuring out how LLMs work would not be equivalent to knowing how the mind works, but might be a start. To know and be able to manipulate the conceptual structure that is latent in LLMs, that would be a major intellectual accomplishment.

Mimi and Eunice ride again

First iris of the year

I've been photographing irises since 2011. Having any show up in mid-April is unusual. I have an iris photo from April 30, 2021, and three photos from April 29, 2022. That's it. Otherwise, the earliest ones are in May.

Monday, April 15, 2024

The AI industry lacks useful ways of measuring performance [the boastful leading the blind]

Kevin Roose, A.I. Has a Measurement Problem, NYTimes, April 25, 2024.

There’s a problem with leading artificial intelligence tools like ChatGPT, Gemini and Claude: We don’t really know how smart they are.

That’s because, unlike companies that make cars or drugs or baby formula, A.I. companies aren’t required to submit their products for testing before releasing them to the public. There’s no Good Housekeeping seal for A.I. chatbots, and few independent groups are putting these tools through their paces in a rigorous way.

Instead, we’re left to rely on the claims of A.I. companies, which often use vague, fuzzy phrases like “improved capabilities” to describe how their models differ from one version to the next. And while there are some standard tests given to A.I. models to assess how good they are at, say, math or logical reasoning, many experts have doubts about how reliable those tests really are.

Safety risk:

Shoddy measurement also creates a safety risk. Without better tests for A.I. models, it’s hard to know which capabilities are improving faster than expected, or which products might pose real threats of harm.

In this year’s A.I. Index — a big annual report put out by Stanford University’s Institute for Human-Centered Artificial Intelligence — the authors describe poor measurement as one of the biggest challenges facing A.I. researchers.

“The lack of standardized evaluation makes it extremely challenging to systematically compare the limitations and risks of various A.I. models,” the report’s editor in chief, Nestor Maslej, told me.

Massive Multitask Language Understanding:

The MMLU, which was released in 2020, consists of a collection of roughly 16,000 multiple-choice questions covering dozens of academic subjects, ranging from abstract algebra to law and medicine. It’s supposed to be a kind of general intelligence test — the more of these questions a chatbot answers correctly, the smarter it is.

It has become the gold standard for A.I. companies competing for dominance. (When Google released its most advanced A.I. model, Gemini Ultra, earlier this year, it boasted that it had scored 90 percent on the MMLU — the highest score ever recorded.)

Dan Hendrycks, an A.I. safety researcher who helped develop the MMLU while in graduate school at the University of California, Berkeley, told me that the test was never supposed to be used for bragging rights. He was alarmed by how quickly A.I. systems were improving, and wanted to encourage researchers to take it more seriously.

Mr. Hendrycks said that while he thought MMLU “probably has another year or two of shelf life,” it will soon need to be replaced by different, harder tests. A.I. systems are getting too smart for the tests we have now, and it’s getting more difficult to design new ones.


There may also be problems with the tests themselves. Several researchers I spoke to warned that the process for administering benchmark tests like MMLU varies slightly from company to company, and that various models’ scores might not be directly comparable.

There is a problem known as “data contamination,” when the questions and answers for benchmark tests are included in an A.I. model’s training data, essentially allowing it to cheat. And there is no independent testing or auditing process for these models, meaning that A.I. companies are essentially grading their own homework.

In short, A.I. measurement is a mess — a tangle of sloppy tests, apples-to-oranges comparisons and self-serving hype that has left users, regulators and A.I. developers themselves grasping in the dark.

There's more at the link.

Color me "not at all surprised." Not only does the field lack a sound theoretical basis, as far as I can tell, it doesn't even know that Hey! that might be useful at at time like this. I don't have a theory to hand over, thought I have a thought or three about how one might go about developing one, but then I'm not making (unfounded) performance claims either.

Without a coherent way of measuring performance, how can you guide the development of your products? Are we in Breugel-land, with the blind leading the blind?

From yesterday's flower walk

Sunday, April 14, 2024

Paul McCartney Let it Be Hollywood Bowl Jimmy Buffet Tribute 4-11-24 Live [tears in your throat]

Listen closely to Sir Paul's voice as he sings. He's singing for a departed friend, Jimmy Buffet, and is skating at the edge of breaking down in tears. But he manages to hold it together. See this post, Paul McCartney on Emotion While Performing.

I discuss this in my book on music, Beethoven's Anvil, pp. 97-98. In the opening paragraph I'm reforming to a performance where Midler was at the edge of tears singing "One More For My Baby" for Johnny Carson:

Let us recall Bette Midler’s performance for Johnny Carson’s farewell. Instead of thinking about how that performance affected us, we might wonder what it was like for her. I recall that her eyes were teared over. That suggests that she may have been skating on the edge of an impulse to cry.

I have had similar experiences while performing on my horn. Tears would well up in my eyes and I could feel a lump in my throat. If I gave in to the impulse I would be unable to continue playing. But if I tried to suppress it completely, the magic would be gone and my playing would become ordinary. I learned to bear down in my chest and abdomen “just so” and skate on the edge. The feeling didn’t disappear, but I could continue playing my instrument.

We’ve all had similar experiences quite independently of music. Imagine you are in some public place and you receive bad news, perhaps about the death of a loved one. You are stricken with grief and feel a strong impulse to cry. At the same time you feel a contrary impulse to remain reserved in public, to suppress the sobbing and the tears. Later on you are called upon to deliver a eulogy at the funeral. Once again you are torn. In order to speak intelligibly you must remain in control of your vocal apparatus. But you are speaking of your dead friend and so are also moved by a grief that wants to commandeer the same muscles in the service of crying out.

Mist on the Hudson

A Quick Note on Software, Civilization, and AI

Just around the corner at Strange Loop Canon Rohit Krishnan has an interesting “Ode to software.” Some bits:

Total man hours invested

Software is also incredibly complex. Far more so than anything tangible we produce. The great Cheops pyramid took around a decade and half with a peak workforce of 40k workers, maybe 1 billion man hours. Parthenon around 36 million. The Colosseum maybe 100 million, same as Notre Dame, that gorgeous cathedral.

However, Facebook, how many man hours do you think that took? It’s the same app, there are some percentage of employees who helped actually write the software, and this changed across the company’s life, and yes there are rewrites but the rewrites are part of how you learn what you should have written in the first place. And there’s a hell of a lot of maintenance, and continuous development, because both the product and the very environment it’s created it constantly changes. The rather crazy answer is that it probably took close to 200 million man hours. Even Uber is close to a hundred million. Microsoft? Partly because of the number of iterations, probably it’s a billion by now!

Complexity & Maintenance

The things we’re making with software are far more intricate, interlinked to other pieces of software and hardware and user input and more, than any we’ve created before. The complexity of a piece of code can be equivalent to the complexity of an entire culture, in its myriad dependencies and the long tail of knowledge that’s needed to truly understand it. [...]

And what gets built rarely stays built. There was a while where everyone thought investing in software was the easiest way to make money, because they got seduced by the gross margin (selling an additional unit of software is extremely cheap). But the implicit depreciation in software is incredibly high! You have to run extremely fast all the time, or you get beaten.

Some wide-ranging comparisons

Empires, cultures, cities. They're similar. They’re path dependent, built on top of what came before, constantly evolving. And they require incredible effort to build and sustain.

If software is “softer” it’s only because that’s a function of iteratively building solutions and then dealing with the problems that resulted from building things, basically sounds a lot like running an organisation.

Looked at this way you might think the Roman empire might have seen over half a trillion man hours dedicated to its maintenance. Egypt, which lasted longer, might have been as much as 2-3x more than that. However, only a tiny fraction of that was in actively building up a superstructure atop which it could grow.

Software as process

So if software is essentially a form of congealed decisions, then it’s not a product, it’s a process. There is no idea of a perfect product, because there is no perfect thought, the point at which you just stop thinking or when you’re done. And we live in the middle of the Eternal Refactor. That’s what software eating the world actually means. A new information supralayer on top of everything else we do.

That’s the magic of software. It makes historic decisions part of the legible firmament, even as it remains the most fleeting ephemeral medium we’ve devised, because decisions change and morph all the time.

It creates entire in silico civilisations with its history writ within its fundament.

And so forth.

[BTW, you might want to read Rohit's article through the lens of the rank theory of the cognitive evolution of culture that David Hays and I have elaborated.]

From software to AI

I blitzed my way through it and made a comment:

As I was reading this I got to thinking about what I want from my personal AI assistant. First of all, I want it to keep my software in order. I hesitate to say that I want it to connect it all together in a seamless web because, well...Of course that's what I want. But I have my doubts about the “seamless” part. I think the nature of the beast is such that the AI might be able to connect it all, but “seamless,” that’s something else. So, it’s got everything connected and does updates and adjustments and every once in a while hits a snag and asks me what it should do. I may or may not have a useful answer, but I can poke around and perhaps the two of us will come up with something.

Generalizing, isn’t that more or less what we want of the whole bloody mess of software? Can the right collection of AIs do something about all that legacy COBOL code that’s still all over the place? [A Google search on “legacy COBOL” got almost 15K results.] What about “legacy FORTRAN” [14K hits]?

Once we've got all that under control, then maybe we can worry about recursive self-improving AIs that solve everyone's problems everywhere and then take over the world.

That is to say, LLMs seem to be fairly good at coding, and perhaps more generally useful in coding than in their language capabilities. Can that capacity be put to work is organizing, integrating, and maintaining the existing body of code? I don’t think that would be a matter of cranking out a bunch of apps. It’s more difficult than that. The legacy code problem, for example, is a BIG one. Why not go to work on that?

I realize that there are “alignment” issues. Given the nature of the machine-learning monoculture that seems to dominate AI these days, sorting those issues into two classes – REAL issues having to do with reliability and performance, and FAKE issues having to do with the projective SciFi fantasies of TechBros – will take some time. But it should shake out in time.

More later.

Saturday, April 13, 2024

Trees & moss [North]

The intellectual monoculture that dominates current AI research

Imagine 2200: Climate Fiction for Future Ancestors [Contest]

One study shows that male bonobos engage in violence more often, though less intensely, than male chimpanzees

Phie Jacobs, Bonobos, the ‘hippie chimps,’ might not be so mellow after all, Science, April, 12, 2024.

Chimps (Pan troglodytes) and bonobos (Pan paniscus) are the closest surviving relatives of modern humans. That makes them interesting subjects for scientists studying how aggression evolved in our own species. But the two apes are very different in their behavior. Chimps are patriarchal, forming all-male coalitions that patrol their territory; they react violently when they happen upon an outsider or neighboring clan. “Chimpanzee intergroup encounters are not possible,” Mouginot explains. “They will kill each other.”

Bonobos—which some have called “hippie chimps”—are far more peaceful, says Brian Hare, an evolutionary anthropologist at Duke who wasn’t involved in the new research. Their societies are dominated by females, and amicable interactions between communities are commonplace—and frequently marked by displays of enthusiastic sexual activity. And although bonobos are no strangers to conflict, scientists who study the apes have never witnessed a lethal encounter. As far as we know, Hare says, “no bonobo has ever murdered another bonobo.” [...]

Field researchers tracked 12 male bonobos from three communities—from the moment they woke until they settled down for the night—at the Kokolopori reserve, located deep within the Democratic Republic of the Congo. They did the same with 14 male chimps from two communities at Gombe Stream National Park in Tanzania. Every time an ape attacked or charged an adversary, or was on the receiving end of such aggression, a researcher was there to document it.

The results, Mouginot says, “really came as a surprise.” Overall, male bonobos turned out to be about three times as likely as chimps to engage in aggressive behavior. Although none of the encounters were lethal—and the team didn’t track the severity of injuries—bonobos weren’t afraid to push, hit, and bite their foes. Their aggression didn’t appear to be a turn-off for female bonobos, who actually preferred to mate with aggressive males. [...]

But whereas male chimpanzees frequently ganged up to or defeat rivals or bully females, bonobo males mainly participated in one-on-one fights and rarely attacked members of the opposite sex. In fact, males were far more likely to be on the receiving end of violence from groups of aggressive, dominant females.

There's much more at the link.

H/t Tyler Cowen

More –


Dario Amodei on "AGI" and the exponential curve [Beware the intellectual monoculture]

Ezra Klein, What if Dario Amodei Is Right About A.I.?NYTimes, Apr. 12, 2024.


Let's skip over a lot of stuff to get to AGI:

EZRA KLEIN: You don’t love the framing of artificial general intelligence, what gets called A.G.I. Typically, this is all described as a race to A.G.I., a race to this system that can do kind of whatever a human can do, but better. What do you understand A.G.I. to mean, when people say it? And why don’t you like it? Why is it not your framework?

DARIO AMODEI: So it’s actually a term I used to use a lot 10 years ago. And that’s because the situation 10 years ago was very different. 10 years ago, everyone was building these very specialized systems, right? Here’s a cat detector. You run it on a picture, and it’ll tell you whether a cat is in it or not. And so I was a proponent all the way back then of like, no, we should be thinking generally. Humans are general. The human brain appears to be general. It appears to get a lot of mileage by generalizing. You should go in that direction.

And I think back then, I kind of even imagined that that was like a discrete thing that we would reach at one point. But it’s a little like, if you look at a city on the horizon and you’re like, we’re going to Chicago, once you get to Chicago, you stop talking in terms of Chicago. You’re like, well, what neighborhood am I going to? What street am I on?

And I feel that way about A.G.I. We have very general systems now. In some ways, they’re better than humans. In some ways, they’re worse. There’s a number of things they can’t do at all. And there’s much improvement still to be gotten. So what I believe in is this thing that I say like a broken record, which is the exponential curve. And so, that general tide is going to increase with every generation of models.

And there’s no one point that’s meaningful. I think there’s just a smooth curve. But there may be points which are societally meaningful, right? We’re already working with, say, drug discovery scientists, companies like Pfizer or Dana-Farber Cancer Institute, on helping with biomedical diagnosis, drug discovery. There’s going to be some point where the models are better at that than the median human drug discovery scientists. I think we’re just going to get to a part of the exponential where things are really interesting.

Just like the chat bots got interesting at a certain stage of the exponential, even though the improvement was smooth, I think at some point, biologists are going to sit up and take notice, much more than they already have, and say, oh, my God, now our field is moving three times as fast as it did before. And now it’s moving 10 times as fast as it did before. And again, when that moment happens, great things are going to happen.

And we’ve already seen little hints of that with things like AlphaFold, which I have great respect for. I was inspired by AlphaFold, right? A direct use of A.I. to advance biological science, which it’ll advance basic science. In the long run, that will advance curing all kinds of diseases. But I think what we need is like 100 different AlphaFolds. And I think the way we’ll ultimately get that is by making the models smarter and putting them in a position where they can design the next AlphaFold.

I like the cities analogy. And, while he doesn't say much about that exponential curve here, he has earlier. 


As far as I can tell he thinks scaling will take us "to infinity and beyond," to quote Buzz Lightyear. Color me skeptical. I think scaling will top out at some point in the next decade or two. Just what range of behaviors AI will represent at that point, I don't know. Scaling up machine learning has taken us to a new region of the space, but I don't see any reason to believe that it exhausts the space.

Here's what bothers me, the belief in scaling (from earlier in the dialog):

DARIO AMODEI: Yes, we’re going to have to make bigger models that use more compute per iteration. We’re going to have to run them for longer by feeding more data into them. And that number of chips times the amount of time that we run things on chips is essentially dollar value because these chips are — you rent them by the hour. That’s the most common model for it. And so, today’s models cost of order $100 million to train, plus or minus factor two or three.

The models that are in training now and that will come out at various times later this year or early next year are closer in cost to $1 billion. So that’s already happening. And then I think in 2025 and 2026, we’ll get more towards $5 or $10 billion.

EZRA KLEIN: So we’re moving very quickly towards a world where the only players who can afford to do this are either giant corporations, companies hooked up to giant corporations — you all are getting billions of dollars from Amazon. OpenAI is getting billions of dollars from Microsoft. Google obviously makes its own.

You can imagine governments — though I don’t know of too many governments doing it directly, though some, like the Saudis, are creating big funds to invest in the space. When we’re talking about the model’s going to cost near to $1 billion, then you imagine a year or two out from that, if you see the same increase, that would be $10-ish billion. Then is it going to be $100 billion? I mean, very quickly, the financial artillery you need to create one of these is going to wall out anyone but the biggest players.

DARIO AMODEI: I basically do agree with you. I think it’s the intellectually honest thing to say that building the big, large scale models, the core foundation model engineering, it is getting more and more expensive. And anyone who wants to build one is going to need to find some way to finance it. And you’ve named most of the ways, right? You can be a large company. You can have some kind of partnership of various kinds with a large company. Or governments would be the other source.

I think one way that it’s not correct is, we’re always going to have a thriving ecosystem of experimentation on small models. For example, the open source community working to make models that are as small and as efficient as possible that are optimized for a particular use case. And also downstream usage of the models. I mean, there’s a blooming ecosystem of startups there that don’t need to train these models from scratch. They just need to consume them and maybe modify them a bit.

$100 (Klein's number, not Amodei's) to train one model? That's a lot of money, and at the moment those decisions are being made by a relatively small group of people who ideas are dominated by the bigger-is-better-Foundation-model culture that dominates A.I. these days. That makes me very uncomfortable.

Too much power

Judging from some remarks Amodei makes later in the dialog, it makes him uncomfortable as well:

DARIO AMODEI: ...if these predictions on the exponential trend are right, and we should be humble — and I don’t know if they’re right or not. My only evidence is that they appear to have been correct for the last few years. And so, I’m just expecting by induction that they continue to be correct. I don’t know that they will, but let’s say they are. The power of these models is going to be really quite incredible.

And as a private actor in charge of one of the companies developing these models, I’m kind of uncomfortable with the amount of power that that entails. I think that it potentially exceeds the power of, say, the social media companies maybe by a lot.

You know, occasionally, in the more science fictiony world of A.I. and the people who think about A.I. risk, someone will ask me like, OK, let’s say you build the A.G.I. What are you going to do with it? Will you cure the diseases? Will you create this kind of society?

And I’m like, who do you think you’re talking to? Like a king? I just find that to be a really, really disturbing way of conceptualizing running an A.I. company. And I hope there are no companies whose C.E.O.s actually think about things that way.

I mean, the whole technology, not just the regulation, but the oversight of the technology, like the wielding of it, it feels a little bit wrong for it to ultimately be in the hands — maybe I think it’s fine at this stage, but to ultimately be in the hands of private actors. There’s something undemocratic about that much power concentration.

EZRA KLEIN: I have now, I think, heard some version of this from the head of most of, maybe all of, the A.I. companies, in one way or another. And it has a quality to me of, Lord, grant me chastity but not yet.

Which is to say that I don’t know what it means to say that we’re going to invent something so powerful that we don’t trust ourselves to wield it. I mean, Amazon just gave you guys $2.75 billion. They don’t want to see that investment nationalized.

No matter how good-hearted you think OpenAI is, Microsoft doesn’t want GPT-7, all of a sudden, the government is like, whoa, whoa, whoa, whoa, whoa. We’re taking this over for the public interest, or the U.N. is going to handle it in some weird world or whatever it might be. I mean, Google doesn’t want that.

And this is a thing that makes me a little skeptical of the responsible scaling laws or the other iterative versions of that I’ve seen in other companies or seen or heard talked about by them, which is that it’s imagining this moment that is going to come later, when the money around these models is even bigger than it is now, the power, the possibility, the economic uses, the social dependence, the celebrity of the founders. It’s all worked out. We’ve maintained our pace on the exponential curve. We’re 10 years in the future.


DARIO AMODEI: And one of the things we and others have found is that, sometimes, there are specific neurons, specific statistical indicators inside the model, not necessarily in its external responses, that can tell you when the model is lying or when it’s telling the truth.

And so at some level, sometimes, not in all circumstances, the models seem to know when they’re saying something false and when they’re saying something true. I wouldn’t say that the models are being intentionally deceptive, right? I wouldn’t ascribe agency or motivation to them, at least in this stage in where we are with A.I. systems. But there does seem to be something going on where the models do seem to need to have a picture of the world and make a distinction between things that are true and things that are not true.

If you think of how the models are trained, they read a bunch of stuff on the internet. A lot of it’s true. Some of it, more than we’d like, is false. And when you’re training the model, it has to model all of it. And so, I think it’s parsimonious, I think it’s useful to the models picture of the world for it to know when things are true and for it to know when things are false.

And then the hope is, can we amplify that signal? Can we either use our internal understanding of the model as an indicator for when the model is lying, or can we use that as a hook for further training? And there are at least hooks. There are at least beginnings of how to try to address this problem.

EZRA KLEIN: So I try as best I can, as somebody not well-versed in the technology here, to follow this work on what you’re describing, which I think, broadly speaking, is interpretability, right? Can we know what is happening inside the model? And over the past year, there have been some much hyped breakthroughs in interpretability.

And when I look at those breakthroughs, they are getting the vaguest possible idea of some relationships happening inside the statistical architecture of very toy models built at a fraction of a fraction of a fraction of a fraction of a fraction of the complexity of Claude 1 or GPT-1, to say nothing of Claude 2, to say nothing of Claude 3, to say nothing of Claude Opus, to say nothing of Claude 4, which will come whenever Claude 4 comes.

We have this quality of like maybe we can imagine a pathway to interpreting a model that has a cognitive complexity of an inchworm. And meanwhile, we’re trying to create a superintelligence. How do you feel about that? How should I feel about that? How do you think about that?

DARIO AMODEI: I think, first, on interpretability, we are seeing substantial progress on being able to characterize, I would say, maybe the generation of models from six months ago. I think it’s not hopeless, and we do see a path. That said, I share your concern that the field is progressing very quickly relative to that.

And we’re trying to put as many resources into interpretability as possible. We’ve had one of our co-founders basically founded the field of interpretability. But also, we have to keep up with the market. So all of it’s very much a dilemma, right? Even if we stopped, then there’s all these other companies in the U.S.. And even if some law stopped all the companies in the U.S., there’s a whole world of this.

There's much more in the discussion. Persuasion is one (scary) topic. Energy usage is another. Copyright and economic displacement too.