Pages in this blog

Monday, June 9, 2025

And the math came tumbling down. o4-mini stuns mathematicians with its prowess [but I’m not worried]

Lyndie Chiou, edited by Clara Moskowitz, At Secret Math Meeting, Researchers Struggle to Outsmart AI, Scientific American, June 6, 2025.

Here's the opening paragraph:

On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world’s most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group’s members faced off in a showdown with a “reasoning” chatbot that was tasked with solving problems they had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world’s hardest solvable problems. “I have colleagues who literally said these models are approaching mathematical genius,” says Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting.

And then we go on for several paragraphs about the whole thing was set up. And then we're making progress:

By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group’s progress. “I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,” he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler “toy” version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. “It was starting to get really cheeky,” says Ono, who is also a freelance mathematical consultant for Epoch AI. “And at the end, it says, ‘No citation necessary because the mystery number was computed by me!’”

Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. “I was not prepared to be contending with an LLM like this,” he says, “I’ve never seen that kind of reasoning before in models. That’s what a scientist does. That’s frightening.”

Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a “strong collaborator.” Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, “This is what a very, very good graduate student would be doing—in fact, more.”

The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete.

The future:

By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable “tier five”—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations.

I remember back at Johns Hopkins, either 68-69 or perhaps fall of 69. I was talking with Prof. Earl Wasserman, a Romanticist. In 68-69 I took his course on the Romantics; that's where I discovered and became hooked on "Kubla Khan." But I was still working with him in the fall, when I'd begin my master's thesis on "Kubla Khan." Anyhow we were conversing in his office, I might even have been leaning back with me feet up on his desk (really, but I forget how long I kept them up). Anyhow, I made some remark about asking questions and he replied that that's perhaps where the real work is, asking the right questions. I've kept that with me to this day.

Claude and ChatGPT can come up with reasonable answers to a wide range of questions. But they lack agency and so lack the capacity to form questions of their own. And once we manage to endow an AI with agency, how then will it develop its ability to ask questions? Where does that come from? How is it grounded in one's experience of the world?

Consider my most recent working paper: Claude on the Eightfold Way: Sonnet 129, Fantasia, Kubla Khan. I asked for both an esoteric (Hindu/Buddhist) reading of Shakespeare's Sonnet 129. Under what circumstances would Claude come up with that juxtaposition on its own? What prompt would lead it to "think" that such a juxtaposition was a reasonable thing to produce in response to the prompt? For example, imagine that I ask it: Pick a famous poem, any famous poem, and do both a Hindu and a Christian interpretation? What poem would it pick and how would the interpretations go? What would be the point of such an odd open-ended prompt? When I asked for those two interpretations (this was the second time around, first time was with ChatGPT), I had a specific reason for asking.

And so on through the rest of that working paper. In what world is it logical to go from a reading of Sonnet 129 to a reading of Disney's Fantasia? I my world, with my experiences, that's what world. I had a very good reason for making that particular connection, and it proved fruitful. And so with the final connection to "Kubla Khan."

I don't care what tests these chatbots can ace. The fact that they know way more that I do? Not a problem. It's up to me to figure out how I can but that breadth of knowledge to use solving problems which interest me, problems grounded in a half-century of serious intellectual work across a half-dozen disciplines.

But, now things get really interesting. You see, way back in 1976 I'd published an article in which I presented a semantic network model for that Shakespeare sonnet: Cognitive Networks and Literary Semantics. So, I wondered, given that ChatGPT now has visual capacity (I was using 4o to produce a mandala for the title page of that my working paper) what would it do if I asked it to create a semantic network model for Sonnet 129? So:

Let’s look at Shakespeare sonnet in the uploaded document. How would you use a semantic networks model to analyze its semantic structure? That is to say, could you create a semantic networks model that would provide a plausible foundation for the semantics of the sonnet?

Yes, we can construct a semantic network model for Shakespeare’s Sonnet 129—as presented in your document—by identifying the core concepts, relations, and dynamic transformations within the poem. Semantic networks, which consist of nodes (concepts) and edges (relations), allow us to map how meanings emerge through interaction rather than merely cataloging lexical items.

OK, so it knows what a semantic network is. It then proceeded to reason its way toward a model. When it finally coughed up a diagram, that diagram was pathetic. It may have been able to give me a “dictionary definition” of a semantic network, but it didn’t have the foggiest idea of how to produce such a model for a particular stretch of language. That’s something worth thinking about while marveling at how wonderful o4-mini is at solving expert-level math problems. 

When I was a graduate student back in the mid-1970s it took me the better part of a semester to learn how to create semantic network diagrams that modeled reasonable chunks of English. It's not a cookbook type of process. I did that learning while being tutored by David Hays for one session a week. I'd produce a diagram or three, show them to Hays, and he'd explain why they didn't really work. Then we set out doing more adequate diagrams. After three months I began to get the hang of it. I don't recall what was involved in working on Sonnet 129, but I certainly did that well after that initial period of learning. I probably covered 30, 40, or more sheets of paper with diagrams before I settled on the ones I used for that 1976 paper.

What kind of cognitive model was I building up during the period when I learned how to create semantic network structures? What are the characteristics of a model that can move fluently between the verbal and visual domains? Whatever that model is, it isn’t the stuff of rocket science. In subsequent years I had no trouble teaching the basics to undergraduates at the Rensselaer Polytechnic Institute. They were smart, but not likely world-class mathematician smart. It’s one thing to more around fluidly within one domain, but moving back and forth between domains, that’s a whole different ballgame.

Finally, note that the process that led me to be able to create semantic network models was a learning process. I would make proposals, get critiques, revise my understanding and then make new proposals. That process took place over months. Chatbots can't do that, not at the level of the underlying LLM. And that, I suspect, is where the deep learning must occur. Not in during the RLHF (reinforcement learning from human feedback) fine-tuning phase, and certainly not during in-context learning in a single session.

H/t to Tyler Cowen for the link to the article.

No comments:

Post a Comment