NEW SAVANNA: Wolfram

Showing posts with label Wolfram. Show all posts

Thursday, April 30, 2026

A Quick Ramble: Computational Compressibility (order in the universe), Religion & Signaling, Silicon Valley vs. Pope Leo [+Latour]

I’m thinking there’s a relationship between computational compressibility (as an index of order in the universe) and my current interest in religion, which is what the other two items are about.

Computational Compressibility and order in the universe

I’ve been particularly pleased by my recent working paper: On Method: Computational Compressibility in Complex Natural and Cultural Phenomena. I’m not quite sure why I find it so pleasing. That it crosses disciplines, that’s nice: weather, microbiology, chess (AI), finance economics (asset pricing), film studies (Hollywood Economics), and cultural evolution (Macroanalysis, 19th century novel). But it’s the specific mode of the argument; it’s about description, about what kinds of things exist.

I began by framing the discussion in terms of Stephen Wolfram’s distinction between computational reducibility and computational irreducibility. I think we’ve got to look at that distinction in terms of Miriam Yevick’s 1975 distinction between holographic or Fourier logic and sequential logic. I think Wolfram’s notion of computational reducibility implies Yevick’s sequential logic. As far as I can tell, her notion of holographic logic doesn’t register with respect to Wolfram’s distinction. But it may be that what I’m calling computational compressibility (within the realm of irreducibility) resonates with her notion of holographic logic.

A random system would of course be irreducible, but that is an extreme case. The systems I looked at in that paper are not random, but the order they exhibit allows them to occupy only a relatively small region of the state space potentially open to them. Given appropriate data about the behavior of the system, that region can be identified through a computational process. Thus they are computationally compressible. The phenomenon of computational compressibility indicates order, but order of a kind that’s different from reducible order. Generative order?

Religion & Signaling

Glenn Loury has a recent video where he distinguishes between what we might call the propositional content of an utterance and its signal value. Explains that at some length in a recent lecture he gave at Stanford, Self-Censorship, Social Information, and the Conditions of Public Reason. In the lecture he examines three cases: race in America, academic life, and Israel and Gaza. His point is that in public discourse on these topics (and others) the signal value of what one is saying often eclipses the propositional value of one’s assertions. This often results in self-censorship where a person withholds their (propositional) views for fear of signally the wrong values.

Thus, in racial discourse:

A key question in this discourse is why racial inequality persists. In this domain speech is saturated with moral meanings. Claims about inequality, education, crime, family structure, or historical responsibility are rarely heard simply as attempts to explain social outcomes. They are taken also as signals of moral orientation—of sympathy or indifference, solidarity or hostility, compassion or disdain. For many audiences, answers to that question relying on structural causes are regarded more favorably than accounts emphasizing cultural factors. Speakers adapt accordingly—many by holding their tongues.

I think lots of discourse is like that. It seems to me that much religious discourse is like this. The conservative Christian arguments against biological evolution never made much sense to me as propositional assertion, nor did the accompanying belief about the age of the earth, but as signaling, those beliefs make more sense.

I'm thinking this signaling business is about coordination, about establishing a rock-bottom level of interpersonal coordination, of common knowledge. I need to think about this some more.

Addendum, May 2, 2026: This post about Bruno Latour's conception of modes of existence is relevant here, each with it own “felicity conditions,” as he calls them.

Silicon Valley vs. Pope Leo

I’ve only recently learned that Pope Leo XIV has decided to make AI a central concern of his papacy. Given that I think that the common Silicon Valley view of AI is idolatrous, or at best straddles the border, I’m wondering if we’re going to see a grand debate and battle between the Pope and Silicon Valley apologists. When I put the question to Claude, here’s how it responded:

Your instinct is well-founded, and Leo XIV has already moved aggressively on this front. Let me lay out what's emerged so far.

Who he is and what he's signaling

Leo XIV — born Robert Prevost in Chicago, the first American pope — identified AI as a central challenge from his very first formal address to the cardinals after his election in May 2025. His choice of the name Leo was itself a signal: Pope Leo XIII confronted the industrial revolution with Rerum Novarum (1891), defending workers' rights and the common good. By taking that name, the new pope announced that the Church would face today's technological upheaval with similar clarity.

The specific critique — and it maps closely onto your "idolatry" framing

This is where it gets interesting for your purposes. In what has been called a programmatic document on AI for World Communications Day, Leo warned that "behind this enormous invisible force that involves us all, there are only a handful of companies — those whose founders have recently been presented as the creators of the 'Person of the Year 2025.'" He expressed "serious concern regarding the oligopolistic control of algorithmic and artificial intelligence systems capable of subtly shaping behaviors and even rewriting human history."

That last phrase — rewriting human history, including the history of the Church — is remarkable. He's naming a power claim that goes beyond economic disruption.

His core argument is that "the challenge is not technological, but anthropological. Safeguarding faces and voices ultimately means safeguarding ourselves." He described face and voice as "sacred," given by God, "the constitutive element of every genuine encounter."

At a Vatican gathering that included representatives from Google, OpenAI, Anthropic, IBM, Meta, and Palantir, he said: "The person is not a system of algorithms: he or she is a creature, relationship, mystery."

The "data is not wisdom" theme

He has repeatedly insisted that "the ability to access vast amounts of data and information should not be confused with the ability to derive meaning and value from it," and that "authentic wisdom has more to do with recognizing the true meaning of life than with the availability of data." This is a direct counter to the implicit epistemology of Silicon Valley — the notion that intelligence is fundamentally information processing, that more data equals more understanding.

Where this connects to your book

Your framing of the dominant Silicon Valley attitude as idolatrous is theologically precise in a way that Leo XIV would likely recognize. The classic definition of idolatry isn't crude statue-worship — it's the attribution of ultimate value and salvific power to something that is not God. When figures like Altman, Musk, or Kurzweil speak of AI as the path to immortality, the transcendence of human limitation, the solution to death — that's structurally idolatrous in the theological sense. It substitutes a human artifact for the divine.

Leo has the resources of the entire tradition of Catholic social teaching to bring to bear on this — not just as ethics but as a rival anthropology. The Church's claim that the person is "creature, relationship, mystery" is not compatible with the claim that the person is, at bottom, a computational process that can be replicated, extended, or superseded by a sufficiently powerful machine.

Whether this becomes a genuine face-off rather than parallel talking depends on whether Silicon Valley feels the need to respond. So far, the tech executives seem to be treating Vatican engagement as a reputational opportunity — showing up in Rome, being photographed, nodding about "human dignity" — without conceding much. The more interesting question is whether Leo pushes beyond dialogue into something more like confrontation. The Rerum Novarum precedent suggests he might: that encyclical didn't merely express concern about the industrial revolution, it made specific demands.

Tuesday, August 12, 2025

Notes on the Metaphysical Structure of the Cosmos

I’ve been thinking about something I all “the metaphysical structure of the cosmos” now and then since August of 2020 when I introduced it in a post written in the wake of GPT-3. I wasn’t entirely serious about it. I’d only just then thought of the idea and hadn’t had time to think it through. It came back to me a few days ago when I was thinking about the “Xanadu meme” and other ideas. This time, in a conversation with Claude, I hazarded the idea that the metaphysical structure of the cosmos was recursive, though I didn’t use the word “cosmos.” Claude agreed.

It's about time I thought about the idea seriously. Is it one I want to use, in a technical sense, going forward? I don’t know. But I’ll offer some thoughts on the matter.

Just what does it mean, “metaphysical structure of the cosmos”?

Here’s what I said when I originally introduced the idea:

There is no a priori reason to believe that world has to be learnable. But if it were not, then we wouldn’t exist, nor would (most?) animals. The existing world, thus, is learnable. The human sensorium and motor system are necessarily adapted to that learnable structure, whatever it is.

I am, at least provisionally, calling that learnable structure the metaphysical structure of the world. Moreover, since humans did not arise de novo that metaphysical structure must necessarily extend through the animal kingdom and, who knows, plants as well.

“How”, you might ask, “does this metaphysical structure of the world differ from the world’s physical structure?” I will say, again provisionally, for I am just now making this up, that it is a matter of intension rather than extension. Extensionally the physical and the metaphysical are one and the same. But intensionally, they are different. We think about them in different terms. We ask different things of them. They have different conceptual affordances. The physical world is meaningless; it is simply there. It is in the metaphysical world that we seek meaning.

As I’ve already said, I introduced the idea in the wake of GPT-3, the first large language model (LLM) that had received much public exposure. Though only small number of people had direct access, enough of those wrote about it in fairly public ways that many of us knew about it, knew enough to be impressed.

When I introduced the idea I used a diagram something like this:

We have the LLM running down the middle, either that or the text on which it is trained. At this level of analysis it could be either one. The structure of the individual texts is a function of the human mind, which created the text, and the world, which the text is about, albeit often only indirectly (as in works of fiction). From this it follows, almost by definition, that the LLM derived from those texts reflects those two things as well, the mind and the world.

The significance of GPT-3, that is, its underlying LLM, and of subsequent LLMs is that that is the first time we’ve got the “whole thing” gathered together in a single, a single what? Model, text, whatever? It’s all there.

Yeah, I know. Not of it. All LLMs are biased in favor of the texts on which they’re built. Much of human thought, especially the thought of pre-literate peoples, is not represented in the training corpus of any LLM. So, we’re talking about an idealization. That’s OK. As long as we’re aware of what we’re doing, we can proceed.

Now, there’s lots of structure in any given text, and there’s lots of structure latent in any LLM. I’m not interested in all of that structure. I’m only interested in the ontological structure, by which I mean something close to the concept of ontology as it is ordinarily used in knowledge representation.

John Sowa’s use is typical. Here’s how he introduces the topic: “The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D.” I’m interested in structure of that catalog. I hypothesize that that structure is something which, for convenience, I call the Great Chain (a term long in use). Here’s a diagram:

That diagram needs some explaining; but this is not the time or place to do that. I say more in this old unpublished paper: Ontology in Knowledge Representation. My point is simply that there is a specific structure there. It’s that structure that interests me.

As an example, that structure tells us the difference between salt and sodium chloride (NaCl). Physically they are the same substance, but conceptually they are quite different. We recognize salt by its texture and appearance and, above all, by its taste. We can taste the presence of salt even where we cannot see it existing as a discrete substance. That is to say, conceptually, salt is adequately characterized by its sensorimotor properties. Sodium chloride is not. Sodium chloride is characterized in terms of a chemical theory that did not exist until the 19th century. That theory talks of atoms and bonds between them. We can’t see atoms or their bonds, rather we infer them on basis of a wide body of experimentation. Conceptually, then, they are very different.

Similarly, in one account of the world, based on one ontology, the Morning Star and the Evening Star are two different objects. But in account based on a heliocentric model of the solar system, they turn out to be the same object, the planet Venus. And so it is with the difference between animals and human beings. To the biologist they are the same kind of thing; human beings are just one kind, one species of animal. But in the common-sense construal of the world, they are very different; humans are not animals, though we have animal-like characteristics.

That, more or less, is what I’m talking about when I talk of the metaphysical structure of the cosmos (or world). That conceptual structure. It’s not explicit in any LLM, but it certainly exists implicitly, otherwise LLMs wouldn’t generate coherent texts. (Note that I have a working paper on ChatGPT and stories where it betrays ontological sensitivity: ChatGPT tells stories, and a note about reverse engineering.)

Yudkowsky + Wolfram on AI Risk [Machine Learning Street Talk]

This is a long, rambling, conversation (4 hours), so I have a hard time recommending the whole thing. I’d say that Wolfram and Yudkowsky do manage to find one another by the 4th hour (sections 6 & 7) and say some interesting things about computation and AI risk (much of the earlier conversation was on tangential matters). I note that the whole thing has been transcribed and there’s a Dropbox link for the conversation.

I will note that the conversation they did have was much better than what I had anticipated, which was a lot of talking past one another. And, yes, there was some of that, but as soon as that got going they worked hard at understanding what each was getting at.

Wolfram has some interesting remarks on computational irreducibility scattered throughout – that’s certainly one of his key concepts, and an important one. He also asserts, here and there, that he’s long been used to the idea that he faces computers smarter than he is; he also notes that he regards the universe as smarter than he is.

My sense is that the computation and AI risk stuff could be written up in a tight 2K words or so, but I don’t have any plans to make the attempt. That might be an exercise for a good student. Perhaps one of the current LLMs (Claude?) could do it.

TOC:

1. Foundational AI Concepts and Risks
[00:00:00] 1.1 AI Optimization and System Capabilities Debate
[00:06:46] 1.2 Computational Irreducibility and Intelligence Limitations
[00:20:09] 1.3 Existential Risk and Species Succession
[00:23:28] 1.4 Consciousness and Value Preservation in AI Systems

2. Ethics and Philosophy in AI
[00:33:24] 2.1 Moral Value of Human Consciousness vs. Computation
[00:36:30] 2.2 Ethics and Moral Philosophy Debate
[00:39:58] 2.3 Existential Risks and Digital Immortality
[00:43:30] 2.4 Consciousness and Personal Identity in Brain Emulation

3. Truth and Logic in AI Systems
[00:54:39] 3.1 AI Persuasion Ethics and Truth
[01:01:48] 3.2 Mathematical Truth and Logic in AI Systems
[01:11:29] 3.3 Universal Truth vs Personal Interpretation in Ethics and Mathematics
[01:14:43] 3.4 Quantum Mechanics and Fundamental Reality Debate

4. AI Capabilities and Constraints
[01:21:21] 4.1 AI Perception and Physical Laws
[01:28:33] 4.2 AI Capabilities and Computational Constraints
[01:34:59] 4.3 AI Motivation and Anthropomorphization Debate
[01:38:09] 4.4 Prediction vs Agency in AI Systems

5. AI System Architecture and Behavior
[01:44:47] 5.1 Computational Irreducibility and Probabilistic Prediction
[01:48:10] 5.2 Teleological vs Mechanistic Explanations of AI Behavior
[02:09:41] 5.3 Machine Learning as Assembly of Computational Components
[02:29:52] 5.4 AI Safety and Predictability in Complex Systems

6. Goal Optimization and Alignment
[02:50:30] 6.1 Goal Specification and Optimization Challenges in AI Systems
[02:58:31] 6.2 Intelligence, Computation, and Goal-Directed Behavior
[03:02:18] 6.3 Optimization Goals and Human Existential Risk
[03:08:49] 6.4 Emergent Goals and AI Alignment Challenges

7. AI Evolution and Risk Assessment
[03:19:44] 7.1 Inner Optimization and Mesa-Optimization Theory
[03:34:00] 7.2 Dynamic AI Goals and Extinction Risk Debate
[03:56:05] 7.3 AI Risk and Biological System Analogies
[04:09:37] 7.4 Expert Risk Assessments and Optimism vs Reality

8. Future Implications and Economics
[04:13:01] 8.1 Economic and Proliferation Considerations

Saturday, October 26, 2024

Hossenfelder: Wolfram's research program seems healthy (after all). Perhaps it can work.

From the webpage:

Mathematician and Computer Scientist Stephen Wolfram wants to do no less than revolutionizing physics. He wants to do it with computer code that gives rise to all the fundamental laws of nature that we know and like -- and maybe more. Unfortunately, Einstein’s theories of general relativity inherently clash with how computers work. And yet, he and his team might have found a clever way around this problem.

Monday, September 30, 2024

Wolfram on Machine Learning

Wolfram has a post in which he reflects on the work he’s done in the last five years: Five Most Productive Years: What Happened and What’s Next. On ChatGPT:

So at the beginning of February 2023 I decided it’d be better for me just to write down once and for all what I knew. It took a little over a week [...]—and then I had an “explainer” (that ran altogether to 76 pages) of ChatGPT.

Partly it talked in general about how machine learning and neural nets work, and how ChatGPT in particular works. But what a lot of people wanted to know was not “how” but “why” ChatGPT works. Why was something like that possible? Well, in effect ChatGPT was showing us a new science discovery—about language. Everyone knows that there’s a certain syntactic grammar of language—like that, in English, sentences typically have the form noun-verb-noun. But what ChatGPT was showing us is that there’s also a semantic grammar—some pattern of rules for what words can be put together and make sense.

My version of “semantic grammar” is the so-called “great chain of being,” which is about conceptual ontology, roughly: “rules for what words can be put together and make sense.” Here’s a post where I discuss it on the context of Wolfram’s work: Stephen Wolfram is looking for “semantic grammar” and “semantic laws of motion” [Great Chain of Being].

A bit later Wolfram says a bit more about what he’s recently discovered about the “essence of machine learning”:

So just a few weeks ago, starting with ideas from the biological evolution project, and mixing in some things I tried back in 1985, I decided to embark on exploring minimal models of machine learning. I just posted the results last week. And, yes, one seems to be able to see the essence of machine learning in systems vastly simpler than neural nets. In these systems one can visualize what’s going on—and it’s basically a story of finding ways to put together lumps of irreducible computation to do the tasks we want. Like stones one might pick up off the ground to put together into a stone wall, one gets something that works, but there’s no reason for there to be any understandable structure to it.

And the future? Among other things: “symbolic discourse language”:

But finally there was blockchain, and with it, smart contracts. And around 2015 I started thinking about how one might represent contracts in general not in legalese but in some precise computational way. And the result was that I began to crispen my ideas about what I called “symbolic discourse language”. I thought about how this might relate to questions like a “constitution for AIs” and so on. But I never quite got around to actually starting to design the specifics of the symbolic discourse language.

But then along came LLMs, together with my theory that their success had to do with a “semantic grammar” of language. And finally now we’ve launched a serious project to build a symbolic discourse language. And, yes, it’s a difficult language design problem, deeply entangled with a whole range of foundational issues in philosophy. But as, by now at least, the world’s most experienced language designer (for better or worse), I feel a responsibility to try to do it.

In addition to language design, there’s also the question of making all the various “symbolic calculi” that describe in appropriately coarse terms the operation of the world. Calculi of motion. Calculi of life (eating, dying, etc.). Calculi of human desires. Etc. As well as calculi that are directly supported by the computation and knowledge in the Wolfram Language.

And just as LLMs can provide a kind of conversational linguistic interface to the Wolfram Language, one can expect them also to do this to our symbolic discourse language. So the pattern will be similar to what it is for Wolfram Language: the symbolic discourse language will provide a formal and (at least within its purview) correct underpinning for the LLM. It may lose the poetry of language that the LLM handles. But from the outset it’ll get its reasoning straight.

The symbolic discourse language is a broad project. But in some sense breadth is what I have specialized in. Because that’s what’s needed to build out the Wolfram Language, and that’s what’s needed in my efforts to pull together the foundations of so many fields.

Wednesday, March 6, 2024

Computational irreducibility and the limits to AI [Wolfram]

Stephen Wolfram, Can AI Solve Science?, Writings, March 5, 2024.

Reality is a mixture of computationally irreducible and reducible phenomena:

But given computational irreducibility, why is science actually possible at all? The key fact is that whenever there’s overall computational irreducibility, there are also an infinite number of pockets of computational reducibility. In other words, there are always certain aspects of a system about which things can be said using limited computational effort. And these are what we typically concentrate on in “doing science”.

But inevitably there are limits to this—and issues that run into computational irreducibility. Sometimes these manifest as questions we just can’t answer, and sometimes as “surprises” we couldn’t see coming. But the point is that if we want to “solve everything” we’ll inevitably be confronted with computational irreducibility, and there just won’t be any way—with AI or otherwise—to shortcut just simulating the system step by step.

There is, however, a subtlety here. What if all we ever want to know about are things that align with computational reducibility? A lot of science—and technology—has been constructed specifically around computationally reducible phenomena. And that’s for example why things like mathematical formulas have been able to be as successful in science as they have.

But we certainly know we haven’t yet solved everything we want in science. And in many cases it seems like we don’t really have a choice about what we need to study; nature, for example, forces it upon us. And the result is that we inevitably end up face-to-face with computational irreducibility.

As we’ll discuss, AI has the potential to give us streamlined ways to find certain kinds of pockets of computational reducibility. But there’ll always be computational irreducibility around, leading to unexpected “surprises” and things we just can’t quickly or “narratively” get to. Will this ever end? No. There’ll always be “more to discover”. Things that need more computation to reach. Pockets of computational reducibility that we didn’t know were there. And ultimately—AI or not—computational irreducibility is what will prevent us from ever being able to completely “solve science”.

Spotting the reducible [protein folding]:

It wasn’t ever really practical with “first-principles physics” to figure out how proteins fold. So the fact that neural nets can get even roughly correct answers is impressive. So how do they do it? A significant part of it is surely effectively just matching chunks of protein to what’s in the training set—and then finding “plausible” ways to “stitch” these chunks together. But there’s probably something else too. One’s familiar with certain “pieces of regularity” in proteins (things like alpha helices and beta sheets). But it seems likely that neural nets are effectively plugging into other kinds of regularity; they’ve somehow found pockets of reducibility that we didn’t know were there. And particularly if just a few pockets of reducibility show up over and over again, they’ll effectively represent new, general “results in science” (say, some new kind of commonly occurring “meta-motif” in protein structure).

But while it’s fundamentally inevitable that there must be an infinite number of pockets of computational reducibility in the end, it’s not clear at the outset either how significant these might be in things we care about, or how successful neural net methods might be in finding them. We might imagine that insofar as neural nets mirror the essential operation of our brains, they’d only be able to find pockets of reducibility in cases where we humans could also readily discover them, say by looking at some visualization or another.

But an important point is that our brains are normally “trained” only on data that we readily experience with our senses: we’ve seen the equivalent of billions of images, and we’ve heard zillions of sounds. But we don’t have direct experience of the microscopic motions of molecules, or of a multitude of kinds of data that scientific observations and measuring devices can deliver.

A neural net, however, can “grow up” with very different “sensory experiences”—say directly experiencing “chemical space”, or, for that matter “metamathematical space”, or the space of financial transactions, or interactions between biological organisms, or whatever. But what kinds of pockets of computational reducibility exist in such cases? Mostly we don’t know. We know the ones that correspond to “known science”. But even though we can expect others must exist, we don’t normally know what they are.

Will they be “accessible” to neural nets? Again, we don’t know. [...]

But let’s say we’ve got a neural net to successfully key into computational reducibility in a particular system. Does that mean it can predict everything? Typically no. Because almost always the computational reducibility is “just a pocket”, and there’s plenty of computational irreducibility—and “surprises”—“outside”.

And indeed this seems to happen even in the case of something like protein folding.

Then Wolfram sets a neural net to work on a traditional problem in physics, the three-body problem. We've been here before:

When the trajectories are fairly simple, the neural net does decently well. But when things get more complicated, it does decreasingly well. It’s as if the neural net has “successfully memorized” the simple cases, but doesn’t know what to do in more complicated cases. And in the end this is very similar to what we saw above in examples like predicting cellular automaton evolution (and presumably also protein folding).

And, yes, once again this is a story of computational irreducibility. To ask to just “get the solution” in one go is to effectively ask for complete computational reducibility. And insofar as one might imagine that—if only one knew how to do it—one could in principle always get a “closed-form formula” for the solution, one’s implicitly assuming computational reducibility. But for many decades I’ve thought that something like the three-body problem is actually quite full of computational irreducibility.

And then there's the weather:

As an example, consider predicting the weather. In the end, this is all about PDEs for fluid dynamics (and, yes, there are also other effects to do with clouds, etc.). And as one approach, one can imagine directly and computationally solving these PDEs. But another approach would be to have a neural net just “learn typical patterns of weather” (as old-time meteorologists had to), and then have the network (a bit like for protein folding) try to patch together these patterns to fit whatever situation arises.

How successful will this be? It’ll probably depend on what we’re looking at. It could be that some particular aspect of the weather shows considerable computational reducibility and is quite predictable, say by neural nets. And if this is the aspect of the weather that we care about, we might conclude that the neural net is doing well. But if something we care about (“will it rain tomorrow?”) doesn’t tap into a pocket of computational reducibility, then neural nets typically won’t be successful in predicting it—and instead there’d be no choice but to do explicit computation, and perhaps impractically much of it.

By this time I'm getting a bit antsy and I'm only halfway through the article. I decided to start leaping ahead. As far as I can tell it's always about finding a tractable patch is some very big space. Then I notice something:

In many ways one can view the essence of science—at least as it’s traditionally been practiced—as being about taking what’s out there in the world and somehow casting it in a form we humans can think about. In effect, we want science to provide a human-accessible narrative for what happens, say in the natural world.

That's pretty much the Godfrey-Smith characterization of philosophy as an integrating activity that I looked at in 3QD, Western Metaphysics is Imploding. Will We Raise a Phoenix from The Ashes? [Catalytic AI], and earlier here, LLMs 1: The role of philosophical thinking in understanding large language models: Calibrating and closing the gap between first-person experience and underlying mechanisms.

Steven Wolfram on AI Alignment

Joe Walker has a general conversation with Wolfram about his work and things and stuff, but there are some remarks about AI alignment at the very end:

WALKER: Okay, interesting. So moving finally to AI, many people worry about unaligned artificial general intelligence, and I think it's a risk we should take seriously. But computational irreducibility must imply that a mathematical definition of alignment is impossible, right?

WOLFRAM: Yes. There isn't a mathematical definition of what we want AIs to be like. The minimal thing we might say about AIs, about their alignment, is: let's have them be like people are. And then people immediately say, "No, we don't want them to be like people. People have all kinds of problems. We want them to be like people aspire to be."

And at that point, you've fallen off the cliff. Because, what do people aspire to be? Well, different people aspire to be different and different cultures aspire in different ways. And I think the concept that there will be a perfect mathematical aspiration is just completely wrongheaded. It's just the wrong type of answer.

The question of how we should be is a question that is a reflection back on us. There is no "this is the way we should be" imposed by mathematics.

Humans have ethical beliefs that are a reflection of humanity. One of the things I realised recently is one of the things that's confusing about ethics is if you're used to doing science, you say, "Well, I'm going to separate a piece of the system," and I'm going to say, "I'm going to study this particular subsystem. I'm going to figure out exactly what happens in the subsystem. Everything else is irrelevant."

But in ethics, you can never do that. So you imagine you're doing one of these trolley problem things. You got to decide whether you're going to kill the three giraffes or the eighteen llamas. And which one is it going to be?

Well, then you realise to really answer that question to the best ability of humanity, you're looking at the tentacles of the religious beliefs of the tribe in Africa that deals with giraffes, and this kind of thing that was the consequence of the llama for its wool that went in this supply chain, and all this kind of thing.

In other words, one of the problems with ethics is it doesn't have the separability that we've been used to in science. In other words, it necessarily pulls in everything, and we don't get to say, "There's this micro ethics for this particular thing; we can solve ethics for this thing without the broader picture of ethics outside."

If you say, "I'm going to make this system of laws, and I'm going to make the system of constraints on AIs, and that means I know everything that's going to happen," well, no, you don't. There will always be an unexpected consequence. There will always be this thing that spurts out and isn't what you expected to have happen, because there's this irreducibility, this kind of inexorable computational process that you can't readily predict.

The idea that we're going to have a prescriptive collection of principles for AIs, and we're going to be able to say, "This is enough, that's everything we need to constrain the AIs in the way we want," it's just not going to happen that way. It just can't happen that way.

Something I've been thinking about recently is, so what the heck do we actually do? I was realising this. We have this connection to ChatGPT, for example, and I was thinking now it can write Wolfram Language code, I can actually run that code on my computer. And right there at the moment where I'm going to press the button that says, "Okay, LLM, whatever code you write, it's going to run on my computer," I'm like, "That's probably a bad idea," because, I don't know, it's going to log into all my accounts everywhere, and it's going to send you email, and it's going to tell you this or that thing, and the LLM is in control now.

And I realised that probably it needs some kind of constraints on this. But what constraints should they be? If I say, well, you can't do anything, you can't modify any file, then there's a lot of stuff that would be useful to me that you can't do.

So there is no set of golden principles that humanity agrees on that are what we aspire to. It's like, sorry, that just doesn't exist. That's not the nature of civilisation. It's not the nature of our society.

And so then the question is, so what do you do when you don't have that? And my best current thought is — in fact, I was just chatting with the person I was chatting with before you about this — is developing what are, let's say, a couple of hundred principles you might pick.

One principle might be, I don't know: "An AI must always have an owner." "An AI must always do what its owner tells it to do." "An AI must, whatever."

Now you might say, an AI must always have an owner? Is that a principle we want? Is that a principle we don't want? Some people will pick differently.

But can you at least provide scaffolding for what might be the set of principles that you want? And then it's like be careful what you wish for because you make up these 200 principles or something, and then you see a few years later, people with placards saying, "Don't do number 34" or something, and you realise, "Oh, my gosh, what did one set up?"

But I think one needs some kind of framework for thinking about these things, rather than just people saying, "Oh, we want AIs to be virtuous." Well, what the heck does that mean?

Or, "We have this one particular thing: we want AIs to not do this societally terrible thing right here, but we're blind to all this other stuff." None of that is going to work.

You have to have this formalisation of ethics that is such that you can actually pick; you can literally say, I'm going to be running with number 23, number 25, and not number 24, or something. But you've got to make that kind of framework.

Tuesday, February 28, 2023

My current thinking about ChatGPT @3QD [Gärdenfors, Wolfram, and the value of speculation]

Some of my thinking anyhow, but by no means all of it. My thinking about the Chatster goes off in many directions, too many to chase down and corral for a single article. Here it is:

ChatGPT is a miracle of rare device. Here’s how I’m thinking about it.

Levels of description and analysis

Much of the article revolves around the question: What’s ChatGPT doing? I presented an idea that David Marr advanced back in the late 1970s and early 1980s: We must describe and analyze the behavior of complex information systems – he was a neuroscientist interested in vision – on several levels. I think we need to do the same with large language models, of which ChatGPT is now the most widely known example.

The company line on LLMs is that they work by statistically guided next-token prediction. I don’t doubt that, but I don’t find it very helpful either. It’s like saying a laptop computer works by executing a fetch-execute cycle. Well, yes it does, and so does every other digital computer. More to the point, that’s how every program is run, whether it’s the operating system, a word processor, a browser, a printer driver, etc. That’s what’s going on at the bottom level.

In the case of a word processor, the top-level processes include such things as: create a new document, save a document, cut text, past text, check the spelling, apply a style to a block of text, and so forth. Those are actions taken by the user. What happens between those actions and the bottom-level fetch-execute is defined by processes implemented in low-level and high-level languages. Each of those processes was programmed by a human programmer. So, in theory, we know everything about what’s going on in a word processor, or, for that matter, any other kind of program.

Things are quite different with LLMs. At the top-level users are issuing prompts and the LLM is responding to them. How does it do that? By issuing word after word after word based on the statistical model it built up during training. What happens in between the bottom level and the top level?

We don’t know. And, all too often, we don’t care. As long as it runs and does impressive things, we don’t care how it works.

That’s no way to create the future.

[Check out David Chapman, How to understand AI systems.]

Peter Gärdenfors’ geometry of meaning

Perhaps the fascinating work Peter Gärdenfors has being doing in semantics can help. He has been developing a geometric concept of meaning. His two books:

Conceptual Spaces: The Geometry of Thought, MIT 2000.
The Geometry of Meaning: Semantics Based on Conceptual Spaces, MIT 2014.

I’m not going to attempt even a quick sketch of his ideas – you can find a bit in this post, The brain, the mind, and GPT-3: Dimensions and conceptual spaces – but I’ll offer a brief passage from Conceptual Spaces, p. 253:

On the symbolic level, searching, matching, of symbol strings, and rule following are central. On the subconceptual level, pattern recognition, pattern transformation, and dynamic adaptation of values are some examples of typical computational processes. And on the intermediate conceptual level, vector calculations, coordinate transformations, as well as other geometrical operations are in focus. Of course, one type of calculation can be simulated by one of the others (for example, by symbolic methods on a Turing machine). A point that is often forgotten, however, is that the simulations will, in general be computationally more complex than the process that is simulated.

The top-level processes of LLMs, such as ChatGPT, are operating at the symbolic level. Those processes are to be described by grammars at the sentence level and by various kinds of discourse models above the sentence level. My 3QD article presents some evidence about how ChatGPT structures stories. That behavior is symbolic and so has to be understood in terms of actions on and with symbols. See, e.g.:

Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy, Emergent linguistic structure in artificial neural networks trained by self-supervision, PNAS, Vol. 117, No. 48, June 3, 2020, pp. 30046-30054, https://doi.org/10.1073/pnas.1907367117.

Andrew M. Saxea, James L. McClelland, and Surya Gangulic, A mathematical theory of semantic development in deep neural networks. PNAS, June 4, 2019, Vol. 116, No. 23, 11537-11546, https://www.pnas.org/content/116/23/11537.

What’s going on at the subconceptual level, that is, the bottom level, and the intermediate level? For that I want to turn to Stephen Wolfram.

Wolfram on ChatGPT

Wolfram has written a long, quasi-technical, and quite useful article, What Is ChatGPT Doing … and Why Does It Work? He makes extensive use of concepts from complex dynamics in his account. For the sake of argument let’s say that’s what ChatGPT is doing at the bottom-level. Perhaps between these two we have Gärdenfors’”intermediate conceptual level” with its “vector calculations” and “other geometrical operations.”

Let’s scroll down through Wolfram’s article to the section, “Meaning Space and Semantic Laws of Motion.” Wolfram observes:

We discussed above that inside ChatGPT any piece of text is effectively represented by an array of numbers that we can think of as coordinates of a point in some kind of “linguistic feature space”. So when ChatGPT continues a piece of text this corresponds to tracing out a trajectory in linguistic feature space.

Given that the idea of a linguistic feature space is very general, Gärdenfors’ geometric semantics is certainly an account of something that can be called a “linguistic feature space.”

Wolfram has been working on example where he follows GPT-2 from the prompt: “The best thing about AI...” After having shown illustrations of a feature space, he asks: So what about trajectories? We can look at the trajectory that a prompt for ChatGPT follows in feature space—and then we can see how ChatGPT continues that (click on the diagrams to embiggen):

There’s certainly no “geometrically obvious” law of motion here. And that’s not at all surprising; we fully expect this to be a considerably more complicated story. And, for example, it’s far from obvious that even if there is a “semantic law of motion” to be found, what kind of embedding (or, in effect, what “variables”) it’ll most naturally be stated in.

In the picture above, we’re showing several steps in the “trajectory”—where at each step we’re picking the word that ChatGPT considers the most probable (the “zero temperature” case). But we can also ask what words can “come next” with what probabilities at a given point:

And what we see in this case is that there’s a “fan” of high-probability words that seems to go in a more or less definite direction in feature space. What happens if we go further? Here are the successive “fans” that appear as we “move along” the trajectory:

Keeping in mind that this is a space of very high dimensionality, are those "more or less definite directions in feature space" the sort of thing we'd find in Gärdenfors' conceptual spaces? Here’s what he says in The Geometry of Meaning (p. 21):

A central idea is that the meanings we use in communication can be described as organized in abstract spatial structures that are expressed in terms of dimensions, distances, regions, and other geometric notions. In addition, I also use some notions from vector algebra.

That surely sounds like it’s in the right ballpark. That does not mean, of course, that it is. But surely it is worth investigating.

The role of speculation in scientific investigation

The deep learning community puts on a great show of intellectual rigor. And in some ways, it is not merely a show. The rigor is there. The technology wouldn’t work as well as it does if it weren’t rigorous in some substantial way.

But there is little rigor that I can see in the way they think about language and texts. I see relatively little knowledge about linguistics, psycholinguistics, and related areas of cognitive science.

Nor is there much interest in figuring out what happens with those 175 billion parameters as they generate text. There is some work being done on reverse engineering (aka mechanistic interpretability) the operations of these engines. There needs to be more, much more – see this article by David Chapman for suggestions, Do AI as science and engineering instead.

Speculation is a necessary part of this process. In order to go boldly where none have gone before you are going to have to speculate. It can’t be helped. Sooner or later some speculation will turn out to be correct, that is, it will be supported by evidence. There is no way to determine that ahead of time. But make them as rigorous and detailed as you can. Speculation must be clear and crisp, otherwise it is not a reliable guide for thought.

More later, much more.

Tuesday, February 21, 2023

What happens in next-token generation in an LLM during inference?

“That which gives rise to the next token” is a space far exceeding the scope of next token itself:

- What’s in output so far
- Embedding space of 1000s dimensions
- Dynamics of “semantic motion”(@stephen_wolfram) in subconceptual space of 100s billions parameters
— Charles Wang (@charleswangb) February 21, 2023

We need a theory for such “dynamics of subsemantics-subconceptual motion” such that it can afford us ideas similar to or better than transformers and higher-order programs running on them (eg RLRF, CAI, ICL, CoT).
— Charles Wang (@charleswangb) February 21, 2023

See my recent post: The idea that ChatGPT is simply “predicting” the next word is, at best, misleading.

Sunday, February 19, 2023

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

But it may also be flat-out wrong. We’ll see when we get a better idea of how inference works in the underlying language model.

* * * * *

Yes, I know that ChatGPT is trained by having it predict the next word, and the next, and the next, for billions and billions of words. The result of all that training is that ChatGPT builds up a complex structure of weights on the 175 billion parameters of its model. It is that structure that emits word after word during inference. Training and inference are two different processes, but that point is not well-made in accounts written for the general public.

Let's get back to the main thread.

I maintain, for example, that when ChatGPT begins a story with the words “Once upon a time,” which it does fairly often, that it “knows” where it is going and that its choice of words is conditioned on that “knowledge” as well as upon the prior words in the stream. It has invoked a ‘story telling procedure’ and that procedure conditions its word choice. Just what that procedure is, and how it works, I don’t know, nor do I know how it is invoked. I do know, that it is not invoked by the phrase “once upon a time” since ChatGPT doesn’t always use that phrase when telling a story. Rather, that phrase is called up through the procedure.

Consider an analogy from jazz. When I set out to improvise a solo on, say, “A Night in Tunisia,” I don’t know what notes I’m going to play from moment to moment, much less do I know how I’m going to end, though I often know when I’m going to end. How do I know that? That’s fixed by the convention in place at the beginning of the tune; that convention says that how many choruses you’re going to play. So, I’ve started my solo. My note choices are, of course, conditioned by what I’ve already played. But they’re also conditioned by my knowledge of when the solo ends.

Something like that must be going on when ChatGPT tells a story. It’s not working against time in the way a musician is, but it does have a sense of what is required to end the story. And it knows what it must do, what kinds of events must take place, in order to get from the beginning to the end. In particular, I’ve been working with stories where the trajectories have five segments: Donné, Disturb, Plan, Execute, Celebrate. The whole trajectory is ‘in place’ when ChatGPT begins telling the story. If you think of the LLM as a complex dynamical system, then the trajectory is a valley in the system’s attractor landscape.

Nor is it just stories. Surely it enacts a different trajectory when you ask it a factual question, or request it to give you a recipe (like I recently did, for Cornish pasty), or generate some computer code.

With that in mind, consider a passage from a recent video by Stephen Wolfram (note: Wolfram doesn’t start speaking until about 9:50):

Starting at roughly 12:16, Wolfram explains:

It is trying write reasonable, it is trying to take an initial piece of text that you might give and is trying to continue that piece of text in a reasonable human-like way, that is sort of characteristic of typical human writing. So, you give it a prompt, you say something, you ask something, and, it’s kind of thinking to itself, “I’ve read the whole web, I’ve read millions of books, how would those typically continue from this prompt that I’ve been given? What’s the reasonable expected continuation based on some kind of average of a few billion pages from the web, a few million books and so on.” So, that’s what it’s always trying to do, it’s aways trying to continue from the initial prompt that it’s given. It’s trying to continue in a statistically sensible way.

Let’s say that you had given it, you had said initially, “The best think about AI is its ability to...” Then ChatGPT has to ask, “What’s it going to say next.”

I don’t have any problem with that (which, BTW, is similar to a passage near the beginning of his recent article, What Is ChatGPT Doing … and Why Does It Work?). Of course ChatGPT is “trying to continue in a statistically sensible way.” We’re all more or less doing that when we speak or write, though there are times when we may set out to be deliberately surprising – but we can set such complications aside. My misgivings set in with this next statement:

Now one thing I should explain about ChatGPT, that’s kind of shocking when you first hear about this. Is, those essays that it’s writing, it’s writing at one word at a time. As it writes each word it doesn’t have a global plan about what’s going to happen. It’s simply saying “what’s the best word to put down next based on what I’ve already written?”

It's the highlighted passage that I find problematic. That story trajectory looks like a global plan to me. It is a loose plan, it doesn’t dictate specific sentences or words, but it does specify general conditions which are to met.

Now, much later in his talk Wolfram will say something like this (I don’t have the time, I’m quoting from his paper):

If one looks at the longest path through ChatGPT, there are about 400 (core) layers involved—in some ways not a huge number. But there are millions of neurons—with a total of 175 billion connections and therefore 175 billion weights. And one thing to realize is that every time ChatGPT generates a new token, it has to do a calculation involving every single one of these weights.

If ChatGPT visits every parameter each time it generates a token, that sure looks “global” to me. What is the relationship between these global calculations and those story trajectories? I surely don’t know.

Perhaps it’s something like this: A story trajectory is a valley in the LLM’s attractor landscape. When it tells a story it enters the valley at one end and continues through to the end, where it exits the valley. That long circuit that visits each of those 175 billion weights in the course of generating each token, that keeps it in the valley until it reaches the other end.

I am reminded, moreover, of the late Walter Freeman’s conception of consciousness as arising through discontinuous whole-hemisphere states of coherence succeeding one another at a “frame rate” of 6 Hz to 10Hz – something I discuss in “Ayahuasca Variations” (2003). It’s the whole hemisphere aspect that’s striking (and somewhat mysterious) given the complex connectivity across many scales and the relatively slow speed of neural conduction.

* * * * *

I was alerted to this issue by a remark made at the blog, Marginal Revolution. On December 20, 2022, Tyler Cowen had linked to an article by Murray Shanahan, Talking About Large Language Models. A commenter named Nabeel Q remarked:

LLMs are *not* simply “predicting the next statistically likely word”, as the author says. Actually, nobody knows how LLMs work. We do know how to train them, but we don’t know how the resulting models do what they do.

Consider the analogy of humans: we know how humans arose (evolution via natural selection), but we don’t have perfect models of how humans worked; we have not solved psychology and neuroscience yet! A relatively simple and specifiable process (evolution) can produce beings of extreme complexity (humans).

Likewise, LLMs are produced by a relatively simple training process (minimizing loss on next-token prediction, using a large training set from the internet, Github, Wikipedia etc.) but the resulting 175 billion parameter model is extremely inscrutable.

So the author is confusing the training process with the model. It’s like saying “although it may appear that humans are telling jokes and writing plays, all they are actually doing is optimizing for survival and reproduction”. This fallacy occurs throughout the paper.

This is the why the field of “AI interpretability” exists at all: to probe large models such as LLMs, and understand how they are producing the incredible results they are producing.

I don’t have any reason to think Wolfram was subject to that confusion. But I think many people are. I suspect that the general public, including many journalists reporting on machine learning, aren’t even aware of the distinction between training the model and using it to make inferences. One simply reads that ChatGPT, or any other comparable LLM, generates text by predicting the next word.

This mis-communication is a MAJOR blunder.

* * * * *

There's an interesting conversation about this taking place over at LessWrong, where I've cross-posted it.

2.21.23: The conversation at LessWrong has been very helpful. Here's a reply I just left there:

Quick reply, after doing a bit of reading and recalling a thing or two: In a 'classical' machine we have a clean separation of process and memory. Memory is kept on the paper tape of our Turing Machine and processing is located in, well, the processor. In a connectionist machine process and memory are all smushed together. GPTs are connectionist virtual machines running on a classical machine. The "plan" I'm looking for is stored in the parameter weights, but it's smeared over a bunch of them. So this classical machine has to visit every one of them before it can output a token.

So, yes, purely next token prediction. But the prediction cycle, in effect, involves 'reassembling' the plan each time through.

To my mind, in order to say we "understand" how this puppy is telling a story, we need to say more than it's a next-token-prediction machine. We need to say something about how that "plan" is smeared over those weights. We need to come up with concepts we can use in formulating such explanations. Maybe the right concepts are just laying scattered about in dusty old file cabinets someplace. But, I'm thinking this is likely, we have to invent some new ones as well.

Wolfram was trained as a physicist. The language of complex dynamics is natural to him, whereas it's a poorly learned third or fourth language for me, So he talks of basins of attractors and attractor landscapes. As far as I can tell, in his language, those 175B parameters can be said to have an attractor landscape. When ChatGPT tells a story it enters the Story Valley in that landscape and walks a path through that valley. When its done with the story, it exits that valley. There are all kinds of valleys (and valleys within valleys (and valleys within them)) in the attractor landscape, for all kinds of tasks.

FWIW, the human brain has roughly 86B neurons. Each of those is connected with roughly 10K other neurons. Those connections are mediated by upward of a 100 different chemicals. And those neurons are surrounded by glial cells. In the old days researchers thought those glial cells were like packing peanuts for the neural net. We now know better and are beginning to figure out what they're doing. Memory is definitely part of their story. So we've got to add them into the mix. How many glial cells per neuron? There might be a number in the literature, but I haven't checked. Anyhow, the number of parameters we need to characterize a human brain is vast.

Thursday, February 16, 2023

Stephen Wolfram is looking for “semantic grammar” and “semantic laws of motion” [Great Chain of Being]

Wolfram has a very interesting account of how ChatGPT works, What Is ChatGPT Doing … and Why Does It Work? Toward the end he talks about “Meaning Space and Semantic Laws of Motion,” which is more or less something I’m thinking about in my current work on ChatGPT’s ability to tell stories. Here he talks of trajectories “in linguistic feature space”:

We discussed above that inside ChatGPT any piece of text is effectively represented by an array of numbers that we can think of as coordinates of a point in some kind of “linguistic feature space”. So when ChatGPT continues a piece of text this corresponds to tracing out a trajectory in linguistic feature space. But now we can ask what makes this trajectory correspond to text we consider meaningful. And might there perhaps be some kind of “semantic laws of motion” that define—or at least constrain—how points in linguistic feature space can move around while preserving “meaningfulness”?

As ChatGPT tells them, stories consist of a sequence of sentences. Those sentences are ordered by a story trajectory. The particular stories I’ve been working with follow a trajectory that seems to have five segments: Donné, Disturb, Plan, Enact, and Celebrate (Benzon 2023). But that’s an aside. Let’s return to Wolfram. Later, after presenting visual illustrations of words arrayed in “semantic space” Wolfram observes, “OK, so it’s at least plausible that we can think of this feature space as placing ‘words nearby in meaning’ close in this space.”

Yes, it is. In particular, he’s looking for a “fundamental ‘ontology’ suitable for a general symbolic discourse language?” That’s what this post is about.

That fundamental ontology has a name, the Great Chain of Being (cf. Lovejoy 1936), though it has rarely been discussed under that rubric in linguistics and related disciplines. This is what a sketch of it looks like:

I have used the term “assignment” for the relationship being specified on the arcs (Benzon 1985, 2018).

Read the diagram from the bottom. A physical object consists of an assignment between a substance and a form. For a rock the substance is mineral and the form can be almost anything. For a cube of sugar the substance is sugar granules, and the form is, obviously, that of a cube. A plant consists of an assignment between a physical object and a vegetative soul (to use Aristotle’s terminology from De Anima). That is to say, plants can have any attributes characteristic of physical objects, visible form, characteristic textures, taste, odor, and so forth, but they also have attributes only applicable to living things, they grow, change form, and they die. Animals can have attributes of the kind characteristic of plants, plus new ones; they can move, hunt, sleep, see, hear, smell, etc. With humans you add still more attributes and capabilities.

Think about it for a minute or two. What kinds of verbs require humans as agents? Verbs about a whole range of mental and linguistic processes, no? Animals are not appropriate agents for such verbs, at least not in the commonest usages: Animals don’t think, or dream, or tell stories, ask questions, etc. But animals can walk, sleep, smell, see, feel pain, etc. And so can humans. The diagram is about capabilities and affordances, which are inherited up the diagram. Objects have weight, but so do plants, animals, and humans. But only humans can speak. Once you tease out the implications, it becomes apparent that assignment structure has very wide-ranging semantic implications.

When someone makes a statement the violates assignment structure, philosophers talk of category mistakes (Sommers 1963). Noam Chomsky’s most famous sentence, “Colorless green ideas sleep furiously,” is a contains two category mistakes and a contradiction. Ideas are not the kind of thing that can sensibly be said to sleep, or to have color. That there can be such a thing as “colorless green” is a contradiction. Wolfram offered his own collection of category errors packaged as a sentence: “Inquisitive electrons eat blue theories for fish.” Electrons can neither be inquisitive nor can they eat anything and theories, like ideas (of which they are a type) cannot have color. Nonetheless, ChatGPT is able to tell a story about them, which I have appended to this post.

Now consider this diagram:

It has a similar form, but is something I sketched out for the purely mechanical world of manufacturing. As in the previous diagram, the relationship between the nodes is that of assignment.

Again reading from the bottom, an object consists of a assignment between a material (comparable to substance in the previous diagram), a shape (comparable to form), and a surface (which would probably be appropriate to the previous diagram as well). Shape is the most complex of these aspects. The shape will have components, such as edges and vertexes or even component shapes. The substance of the part is the stuff of which the object is made; it has properties of mass, density, ductility, heat conductivity, etc. Finally, the surface must be considered separately from substance and shape because different types of process apply to it. The surface may be painted or plated, and/or ground to set specifications. This doesn't affect the shape or the nature of the substance.

We can continue on up: An assembly has a different ontological structure than a primitive part, which is to say than an assembly is not merely a complex part. There is more to it. An assembly is an assignment between a part and a connectivity structure. To think of an assembly as a part, first imagine shrinking an envelope around the assembly. The resulting shape/surface/substance triple is the assembly as part. Its shape and surface might be quite complex (think of an automobile engine as an assembly which is part of the automobile), and its substance heterogenous (e.g. rubber, plastic, three kinds of metal, etc.). This part is, of course, a complex part. As such it has components, the simpler (perhaps even primitive) parts which make it up. Simple objects do not have components, though their shapes do.

We can keep moving up. A mechanism is an assembly with articulated parts, thus allowing them to move. Add a source of power to the mechanism and you have an engine. Over there to the left we have a computer conceived of as an assembly with a program. That is no doubt way too simplified, but at this level of conceptual resolution it may be satisfactory.

What’s important about these diagrams is the relationship they establish between the objects in them: assignment. The ontological structure Wolfram is looking for consists of assignment structure. It warrants further exploration.

References

Benzon, William L. (1985) William Benzon, Ontology in Knowledge Representation for CIM, Computer Integrated Manufacturing Program, Center for Manufacturing and Technology Transfer, Rennselaer Polytechnic Institute, Doc.# CIMMMW85TR034, 1985. https://www.academia.edu/28723042/Ontology_of_Common_Sense.

Benzon, William L. (2018) Ontology in Cognition: The Assignment Relation and the Great Chain of Being, Working Paper. 5 pp., https://www.academia.edu/37754574/Ontology_in_Cognition_The_Assignment_Relation_and_the_Great_Chain_of_Being.

Benzon, William L. (2023) ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking. Version 3. Working Paper, Feb. 6, 2023, 25 pp., https://www.academia.edu/95608526/ChatGPT_intimates_a_tantalizing_future_its_core_LLM_is_organized_on_multiple_levels_and_it_has_broken_the_idea_of_thinking_Version_3

Lovejoy, Arthur O. (1936). The Great Chain of Being: A Study of the History of an Idea. Harvard University Press. 1936. Available online, https://archive.org/details/ArthurO.LovejoyTheGreatChainOfBeing.

Sommers, Fred (1963) “Types and Ontology.” Philosophical Review. 72, 327 - 363.

The Long Story of How Neural Nets Got to Where They Are

This is a fascinating discussion between Stephen Wolfram and Terry Sejnowski. These guys, Sejnowski especially, are pulling names from all over the place. There are lots of isolated and semi-isolated figures in this story. But it’s (the beginnings of) a map of where all this came from. Should probably check it against Grace Lindsey's wonderful little book, Models of the Mind. Note that the early stuff about computational linguistics is messed up. Chomsky had nothing to do with it.

And so forth and so on.

After about 3:00:31 Wolfram and Sejnowski make that point that, while the field of neural networks started with a small group of "true believers," as most intellectual enterprises do, in the case of neural nets (Wolfram) "there were all these separate little pockets, I think that's not so common." In many cases "the tree grows from one trunk, so to speak." To which Sejnowski responds, "Ah, OK, that's an interesting observation." Wolfram notes that, for example, in the case of "quantum mechanics there were not multiple trunks." He goes on to say that, in a sense, it all grew from the initial McCulloch-Pitts 1943 paper, "and yet there were all these separate branches, same seed but it wasn't a single trunk from that." Moreover, "the time scale from the initial seed to fruition is extremely long. Many generations." Sejnowski agrees "that it was very diverse. But one possible explanation is that you have a much bigger search space to explore."

Thursday, February 2, 2023

Stephen Wolfram explains ChatGPT

Wolfram now has a long article about ChatGPT: What Is ChatGPT Doing … and Why Does It Work? It's worth reading.

Thursday, July 15, 2021

Six interesting things Stephen Wolfram has said in an interview with Sean Carroll, so far

Sean Carroll has just interviewed physicist and mathematician Stephen Wolfram. It’s a long interview and I’ve not finished it. I’m only 45 minutes in to an interview that rand to 2 hours and 39 minutes. For what it’s worth, I bought Wolfram’s A New Kind of Science when it came out, read around in it, found some interesting things. Wolfram, as I’m sure you know, is a controversial figure.

Anyhow, here’s six things I highlighted in the interview. I’m not going to attempt to explain how we got to the first point, and then how from there to the second, and so forth. I’m just presenting these things.

As usual, there’s more at the link.

* * * * *

1. Beneath pre-existing notions of space and time

0:09:48.9 Stephen Wolfram: Right. Yeah, that’s right. I never thought that cellular automata would be relevant to fundamental physics. I mean, other people were sort of saying “Cellular automata are going to solve fundamental physics!” And I was like, “No, please don’t say that. That’s just not going to work.” And cellular automata are very minimal models, once you have a ;xed notion of space and time. And they’ve been extremely fertile models for huge numbers of different kinds of things, from road trakc aow, to chemical catalysis, to leaf growth, to all kinds of different things.

0:10:21.0 SW: But they assume a pre-existing notion of space and time, and so in thinking about physics I had realized ages ago that we really need to go underneath the notions of space and time that we’re familiar with and see how to build those up from something more fundamental. And so sort of the real starting point of our project is to think about just these... They’re just... Space is made of something, that has not been sort of in the tradition of physics, and the tradition of mathematics as well, that’s not really been a thing that people think about. Space just is something in which things are placed at certain positions and so on. [...]

0:12:40.1 Sean Carroll: So what I have in mind is a bunch of dots, nodes that are connected by lines to form some kind of graph, and do I imagine that the dots, the nodes, are they labeled? Are there different kinds of dots or is there just an Ur-dotness that they all share?

0:12:58.9 SW: It’s an Ur-dot, that’s just... They’re all identical. The only thing about them is they are... Two dots are either identical or they’re not identical. That is, there’s a particular dot or there’s another dot, and so the only thing about them is kind of their identity, so to speak. They don’t have colors, they don’t have positions, they just know “I’m this dot and not that other dot.” [...]

0:13:47.3 SW: That’s the idea. So, there are many pieces to that setup. For example, one thing is, all there is is these kind of atoms of space and connections. There’s nothing in space, there’s nothing... It’s not like you then say, “We’ve got space, now let’s put an electron in space.” If you want an electron, you have to make it out of space, so to speak. You have to make it from features of that pattern of connections between the atoms of space. So that’s kind of... That’s sort of the base story of what is the data structure of the universe, what is the universe sort of made of, and that’s the idea, it’s made of these discrete elements and relations between those elements, which we can think of as being kind of lines joining them.

2. Open to the public

0:15:47.9 SC: And I like the fact that you just used the word hypergraph, because I like to share the jargon with the audience so they can go look it up, because especially, I should note, you... I don’t want to talk too much about style and procedure here, ’cause there’s far too much physics to talk about, but you did announce this project with a website and a call to participate. If people want to dig into the details, what is the URL they should be going to?

0:16:15.0 SW: Wolframphysics.org.

0:16:17.0 SC: There you go. And people can ;nd out what the details are and do their own calculations, so that’s...

0:16:22.2 SW: The other thing we’ve done kind of in terms of the public, which has been really a big success, is we’ve live-streamed a lot of our internal working meetings, and we’ve... All the notes from this project are all posted on the web, basically the day after they’re made, typically, and that’s... And it’s been really interesting, because a lot of people who are... A lot of professional physicists have gotten involved but also a lot of people who are, for example, involved in computer kinds of things and understand sometimes more of some of our jargon than the physicists would understand, have also gotten involved. And it’s really been an interesting process to sort of do science live and in public, so to speak.

3. Computational equivalence, computational reducibility – FUNDAMENTAL STUFF

0:24:57.0 SW: So in other words, what that’s saying is you might think you start off with an incredibly trivial program that only does trivial computations, as you make the program a little bit more complicated it would gradually do more and more sophisticated computations, and as you make a really, really, really complicated program, it would do really, really complicated computations. But the somewhat surprising claim of the principle of computational equivalence is that’s not true. Once you get above some very low threshold, you’re immediately at the max, you’re immediately doing computations that are as sophisticated as anything. And that principle has many implications, and probably the most important for, immediately, for physics is this phenomenon I call computational irreducibility.

0:25:40.2 SC: That was the next one. So go ahead and say that, yeah.

0:25:43.7 SW: Right, so the question there is, if you are running a program, can you tell what it will do. One way you can tell what it will do is you just run every step, just like the program would run itself. But another thing you can do is say, I’m much smarter than that program, I can jump ahead and it’s going to run for a million steps but I can jump to the end and say the answer is going to be 42, or something.

0:26:08.2 SW: And so one of the ideas of the exact sciences, the mathematical sciences, for a long time has been sort of a sign of doing wonderful things is that you can jump ahead like that, you can readily predict where will the planets be at some time in the future and so on. So that’s what I call computational reducibility, the ability to reduce the computational effort necessary to and the answer to jump ahead. So the claim is actually, there are lots of systems that are computationally irreducible, in the sense the only way to ;nd out what they’ll do is just to run every step, or in effect just to observe what the system does.

0:26:47.9 SW: And the reason that happens is because of the principle of computational equivalence. Because if you think you’re the observer, you’re the predictor, you are a computational system as well, and the question is, how do you as a computational system compete with the system you’re trying to predict. So if it was the case that you could really be smarter than the system you’re trying to predict, then yes, you could potentially jump ahead, but what the principle of computational equivalence says is, no, actually that won’t be that way. You will be exactly computationally equivalent to the system you’re trying to predict. And so the system you’re trying to predict, its behavior will seem to you sort of irreducibly complicated. So that’s one of its implications.

4. It takes a few steps to get from next to nothing to the floor of our universe

0:31:12.8 SC: I’ll take all the reasons for optimism I can get, I’m all on board with that. So good, so with those in mind, let’s go back to the physics that you’re actually constructing here. We have a hypergraph, we have some rules for updating it, now, should I read the speci;c... Is the idea that there is a speci;c correct rule for our universe and that rule basically is the fundamental laws of physics?

0:31:35.5 SW: Okay. So this is where things get a little bit more complicated. So, I think that it is probably the case that there will be a rule that we’ll be able to hold up and say, with this rule, we can reproduce what we observe in physics. Now, footnote, which is a really interesting footnote, I think. You might say, why did we get this rule and not another rule? It seems very... Particularly if the rule is simple, that’s like Copernicus was wrong, so to speak. There isn’t nothing special about us, we got the simple universe, so to speak, or we got the universe with the simple rule, not the universe with the, to us, incredibly complicated-seeming rule.

0:32:15.2 SW: So the thing that is then very surprising is that... And this is... We probably have to go a few more steps before we can really, really dig into this properly, is the idea that actually you can think of the universe as running all possible rules. And we as observers of the universe are essentially exist in a particular... In a sense- reference frame, in a particular position in rulial space, as we call it, that essentially gives us a particular sampling of this sort of universe of all possible universes. And that particular sampling is given our way of parsing what happens in the universe, that is something that we could attribute to a particular underlying rule, but actually, we can also think of it as just a slice of this kind of universe of all possible universes.

5. The 2nd law of thermodynamics and being computationally bounded

0:35:23.6 SW: Intelligent observers? No. Observers with... So the question is, what is the right idealization of a human observer? So I’ll give you a couple that seem to be enough, okay? So one important one is we’re computationally bounded. We don’t get to observe... So let’s take the gas molecule example again. If you... You have this gas, it’s got a bunch of molecules bouncing around, if you’re a sophisticated-enough observer you can see every single molecule, you can work out all the collisions, and in particular, that will allow you... So, a big principle in statistical mechanics is the second law of thermodynamics, which says typically the sort of... Typically things get more random as the molecules bounce around in a gas.

0:36:06.6 SW: But if we are not computationally-bounded observers and we can figure out what all these trajectories of all these molecules are, we don’t get the second law of thermodynamics. As non-computationally bounded observers, the second law of thermodynamics simply isn’t true. And that same idea of a computationally-bounded observer is necessary, I think, for us to believe that space has a continuous structure, and various other things about the universe. So that’s kind of step one.

0:36:34.9 SC: So we’re not Laplace’s demon.

0:36:37.5 SW: What’s that, sorry? 0:36:38.0 SC: We’re not Laplace’s demon.

[Side note on the anthropic principle]

0:40:56.7 SW: I think you raised the anthropic principle. And, to me, the anthropic principle is sort of a story of lack of imagination, so to speak. Because it’s saying the only way that we can have life, intelligence, consciousness, whatever, is the particular way we’ve seen it. And one of the consequences of this principle of computational equivalence is that actually something like intelligence is ubiquitous.

6. Spacetime

0:44:40.2 SW: Oh, yeah, yeah. Right. The sequence of updates, the hypergraph together with all its updates is supposed to be spacetime. And one of the things that is interesting and non-trivial here is most traditional views of physics have thought of space and time as being the same kind of thing. In this model they’re really not.

0:45:00.0 SC: Sure.

0:45:00.0 SW: Space is the extent of the spatial hypergraph. Time is the computational process of updating this hypergraph. So time is the progression of a computation. Space is just, oh, you follow these connections in the hypergraph. And so that makes it not at all obvious that you’re going to get things like relativity out of the model, because one is breaking apart the traditional connection between space and time.