NEW SAVANNA: June 2022

Thursday, June 30, 2022

Steven Pinker and Scott Aaronson debate scaling

Scott Aaronson has hosted Steven Pinker to a discussion at Shtetl-Optimized.

Pinker on AGI:

Regarding the second, engineering question of whether scaling up deep-learning models will “get us to Artificial General Intelligence”: I think the question is probably ill-conceived, because I think the concept of “general intelligence” is meaningless. (I’m not referring to the psychometric variable g, also called “general intelligence,” namely the principal component of correlated variation across IQ subtests. This is a variable that aggregates many contributors to the brain’s efficiency such as cortical thickness and neural transmission speed, but it is not a mechanism (just as “horsepower” is a meaningful variable, but it doesn’t explain how cars move.) I find most characterizations of AGI to be either circular (such as “smarter than humans in every way,” begging the question of what “smarter” means) or mystical—a kind of omniscient, omnipotent, and clairvoyant power to solve any problem. No logician has ever outlined a normative model of what general intelligence would consist of, and even Turing swapped it out for the problem of fooling an observer, which spawned 70 years of unhelpful reminders of how easy it is to fool an observer.

If we do try to define “intelligence” in terms of mechanism rather than magic, it seems to me it would be something like “the ability to use information to attain a goal in an environment.” (“Use information” is shorthand for performing computations that embody laws that govern the world, namely logic, cause and effect, and statistical regularities. “Attain a goal” is shorthand for optimizing the attainment of multiple goals, since different goals trade off.) Specifying the goal is critical to any definition of intelligence: a given strategy in basketball will be intelligent if you’re trying to win a game and stupid if you’re trying to throw it. So is the environment: a given strategy can be smart under NBA rules and stupid under college rules.

Since a goal itself is neither intelligent or unintelligent (Hume and all that), but must be exogenously built into a system, and since no physical system has clairvoyance for all the laws of the world it inhabits down to the last butterfly wing-flap, this implies that there are as many intelligences as there are goals and environments. There will be no omnipotent superintelligence or wonder algorithm (or singularity or AGI or existential threat or foom), just better and better gadgets.

Aaronson responds:

Basically, one side says that, while GPT-3 is of course mind-bogglingly impressive, and while it refuted confident predictions that no such thing would work, in the end it’s just a text-prediction engine that will run with any absurd premise it’s given, and it fails to model the world the way humans do. The other side says that, while GPT-3 is of course just a text-prediction engine that will run with any absurd premise it’s given, and while it fails to model the world the way humans do, in the end it’s mind-bogglingly impressive, and it refuted confident predictions that no such thing would work.

Though I’m with Pinker on the definition of AGI, I also take the second of the two positions Aaronson set forth, which is, I take it, Aaronson’s position while the first is Pinker’s position. That’s why I wrote GPT-3: Waterloo or Rubicon? Here be Dragons (Version 4.1).

Aaronson continues:

I freely admit that I have no principled definition of “general intelligence,” let alone of “superintelligence.” To my mind, though, there’s a simple proof-of-principle that there’s something an AI could do that pretty much any of us would call “superintelligent.” Namely, it could say whatever Albert Einstein would say in a given situation, while thinking a thousand times faster. Feed the AI all the information about physics that the historical Einstein had in 1904, for example, and it would discover special relativity in a few hours, followed by general relativity a few days later. Give the AI a year, and it would think … well, whatever thoughts Einstein would’ve thought, if he’d had a millennium in peak mental condition to think them.

If nothing else, this AI could work by simulating Einstein’s brain neuron-by-neuron—provided we believe in the computational theory of mind, as I’m assuming we do. It’s true that we don’t know the detailed structure of Einstein’s brain in order to simulate it [...]. But that’s irrelevant to the argument. It’s also true that the AI won’t experience the same environment that Einstein would have—so, alright, imagine putting it in a very comfortable simulated study, and letting it interact with the world’s flesh-based physicists. A-Einstein can even propose experiments for the human physicists to do—he’ll just have to wait an excruciatingly long subjective time for their answers. But that’s OK: as an AI, he never gets old.

Next let’s throw into the mix AI Von Neumann, AI Ramanujan, AI Jane Austen, even AI Steven Pinker—all, of course, sped up 1,000x compared to their meat versions, even able to interact with thousands of sped-up copies of themselves and other scientists and artists. Do we agree that these entities quickly become the predominant intellectual force on earth—to the point where there’s little for the original humans left to do but understand and implement the AIs’ outputs (and, of course, eat, drink, and enjoy their lives, assuming the AIs can’t or don’t want to prevent that)?

Eh. Now that I have an explicit definition of artificial minds, I have no need for a definition of artificial intelligence. While my primer (Relational Nets Over Attractors, A Primer: Part 1, Design for a Mind) is mostly about the human mind and the human brain, the fact that I was able to propose a substrate-neutral definition of “mind” has the side-effect that I can talk about artificial minds as mechanisms, not magic, to use Pinker’s formulation.

Aaronson also notes:

I should clarify that, in practice, I don’t expect AGI to work by slavishly emulating humans—and not only because of the practical difficulties of scanning brains, especially deceased ones. Like with airplanes, like with existing deep learning, I expect future AIs to take some inspiration from the natural world but also to depart from it whenever convenient. The point is that, since there’s something that would plainly count as “superintelligence,” the question of whether it can be achieved is therefore “merely” an engineering question, not a philosophical one.

That is consistent with the view I have articulated in the primer.

Aaronson has more to say, as does Pinker. As of this moment, the dialog has attracted 100 comments (including two from me). It’s worth exploring.

Hot! Hot! Hot!

Wednesday, June 29, 2022

Gary Marcus and Our Innate Linguistic Capacity [Relational Net Primer]

During the period that I had been writing my primer on relational networks and attractor landscapes, Gary Marcus had taken to Twitter to challenge the deep learning community on the need for symbolic processing. I agree that symbolic processing is necessary for any device that is going to approximate or match the human capacity for abstract thought. However, as I’ve made clear in the primer, I don’t think that symbolic processing is primitive. Neural nets are primitive and symbolic processinig is implimented in them.

Marcus on Innateness

It is not clear to me just what Marcus thinks about this. At the beginning of chapter 6 of The Algebraic Mind[1], which is, I believe, his major theoretical statement, Marcus says (p. 143): “The suggestion that I consider in this chapter is that the machinery of symbol-manipulation is included in the set of things that are initially available to the child, prior to experience with the external world.” Later on he will reject the “idea that the DNA could specify a point-by-point wiring diagram for the human brain” (p. 156). After weaving his way through a mind-boggling network of evidence about development, Marcus offers (p. 165):

Genetically driven mechanisms (such as the cascades described above) could, in tandem with activity-dependence, lead to the construction of the machinery of symbol-manipulation—without in any way depending on learning, allowing a reconciliation of nativism with developmental flexibility.

At this point I’m afraid I’m driven to echo the great Roberto Duran and to say “no más”. It makes my head hurt.

Let me skip ahead to his final chapter, where he says (p. 172):

As I suggested in chapter 6, differences between the cognition of humans and other primates may lie not so much in the basic-level components but in how those components are interconnected. To understand human cognition, we need to understand how basic computational components are integrated into more complex devices—such as parsers, language acquisition devices, modules for recognizing objects, and so forth—and we need to understand how our knowledge is structured, what sorts of basic conceptual distinctions we represent, and so forth.

I agree with Marcus on that first sentence. I’m not so sure about some of the rest, though I do believe that last remark, after “so forth.” That structure what most of the primer is about.

Now, the word “modules” is most important in an intellectual tradition of which I’m skeptical. If by parser Marcus means the sorts of things that are the staple of classical computational linguistics, then I doubt that human brains have such things. That’s not an argument, and this isn’t the place to make one, but I will point out that, from its origins in the problem of machine translation in the 1950s until well into the 1970s, computational linguistics had little to no semantics to speak of – nor, for that matter, has it ever developed more than a smattering of semantics. Without semantics, syntax is what is left. If syntax is what you have, then you need a sophisticated and elaborate parser. I believe that human language is grounded in semantics, which is in turn grounded in perception and action, and that syntax supports semantics. Relatively little parsing, as such, is required.

As for the language acquisition device, we have already seen that Marcus believes symbol manipulation is available to children “prior to experience with the external world.” I certainly do not believe that the mind is a proverbial blank state. Infants do come into the world with some fairly specific perceptual and behavioral equipment. But whether or not that includes a language acquisition device, well, let me evade the issue by spining a tale.

Teaching Chimpanzees Language

This is a tale, not a true one, but a thought experiment. I invented it while thinking about the origins of language. I came up with this tale while thinking about various early attempts that had been made to teach chimpanzees language. All of them ended in failure [2]. In the most intense of these efforts, Keith and Cathy Hayes raised a baby chimp in their household from 1947 to 1954. But that close and sustained interaction with Vicki, the young chimp in question, was not sufficient.

Then in the late 1960s Allen and Beatrice Gardner began training a chimp, Washoe, in Ameslan, a sign language used among the deaf. This effort was far more successful. Within three years Washoe had a vocabulary of Ameslan 85 signs and she sometimes created signs of her own.

The results startled the scientific community and precipitated both more research along similar lines — as well as work where chimps communicated by pressing iconically identified buttons on a computerized panel — and considerable controversy over whether or not ape language was REAL language. That controversy is of little direct interest to me, though I certainly favor the view that this interesting behavior is not really language. What is interesting is the fact that these various chimps managed even the modest language that they did.

The string of earlier failures had led to a cessation of attempts. It seemed impossible to teach language to apes. It would seem that they just didn’t have the capacity. Upon reflection, however, the research community came to suspect that the problem might have more to do with vocal control than with central cognitive capacity. And so the Gardners acted on that supposition and succeeded where others had failed. It turns out that whatever chimpanzee cognitive capacity was, it was capable of orchestrating surprising behavior.

Note that nothing had changed about the chimpanzees. Those that learned some Ameslan signs, and those that learned to press buttons on a panel, were of the same species as those that had earlier failed to learn to speak. What had changed was the environment. The (researchers in the) environment no longer asked for vocalizations. The environment asked for gestures, or button presses. These the chimps could provide, thereby allowing them to communicate with the (researchers in the) environment in a new way.

It seemed to me that this provided a way to attack the problem of language origins from a slightly different angle.

How Aliens From Outer Space Brought Us Language

I imagined that a long time ago groups of very clever apes – more so than any extant species – were living on the African savannas. One day some flying saucers appeared in the sky and landed. The extra-terrestrials who emerged were extraordinarily adept at interacting with those apes and were entirely benevolent in their actions. These space aliens taught the apes how to sing and dance and talk and tell stories, and so forth. Then, after thirty years or so, the ETs left without a trace. The apes had absorbed the extra-terrestrials’ lessons so well that they were able to pass them on to their progeny generation after generation. Thus human culture and history were born.

Now, unless you actually believe in UFOs, and in the benevolence of their crews, this little fantasy does not seem very promising, for it is a fantasy about things that certainly never happened. But if it had happened, it does seem to remove the mystery from language’s origins. Instead of something from nothing we have language handed to us on a platter. We learned it from some other folks, perhaps they were short little fellows with green skin, or perhaps they were the more modern style of aliens with pale complexions, catlike pupils in almond eyes and elongated heads. This story hasn’t taught as anything new about just how language works, but one source of mystery has disappeared.

But, and here is where we get to the heart of the matter, what would have to have been true in order for this to have worked? Just as the chimps before Ameslan were genetically the same as those after, so the clever apes before alien-instruction were the same as the proto-humans after. The species has not changed, the genome is the same – at least for the initial generation. The capacity for language would have to have been inherent in the brains of those clever apes. All the aliens did was activate that capacity. Once that happened the newly emergent proto-humans were able to sustain and further develop language on their own. Thus the critical event is something that precipitates a reconfiguration of existing capabilities, a Gestalt switch. The rabbit has become a duck, or vice versa, the crone a young lady, or vice versa.

Where’s the Language Acquisition Device?

However, we’re not interested in the phylogenetic origins of language, we’re interested in how children acquire it. Whatever their genetic endowment, they live in a world surrounded by language speakers, and some of them are closely attuned to the infant’s needs, desires, and evoling capacities. It’s not clear to me just what specialized language acquisition device the infant needs.

The problem I have in thinking about this is very much like the problem I have identifying the acceleration subsystem in an automobile. I know that the engine has more effect on acceleration than the backseat upholstery, and the tires are more important than the cup-holders, but anything with mass affects the acceleration. Is there anything in the engine that is there specifically to enhance acceleration and nothing else, or is it a matter of proportion, strength, and adjustment of the components necessary to the engine?

That’s my problem with the idea of a language acquisition device. Given that infants are born with various perceptual capabilites (which Marcus recounts) it is not obvious whether or not we need anything more specific to language than, for example, the ability to track adult speech patterns that Condon and Sander reported in 1974 [3]. I think it’s a bit much to call that a language acquisition device. If you must have one, why not say that the human brain as a whole is, among many other things, a language acquisition device?

References

[1] Marcus, Gary F (2001). The Algebraic Mind (Learning, Development, and Conceptual Change). MIT Press. Kindle Edition.

[2] Linden, Eugene. (1974). Apes, Men, and Language. New York: Saturday Review Press, E. P. Dutton.

[3] Condon, W.S., & Sander, L.W. (1974). Neonate movement is synchronized with adult speech: Interactional participation and language acquisition. Science, 183, 99-101.

Hoboken Peek-A-Boo

Tuesday, June 28, 2022

“Kubla-Khan” in Light of the Relational Nets Primer

This is another post in my series of post-Relational-Net-Primer reflections.

“Kubla Khan”, as I’ve explained many times, originally in this piece first published in 1975, is what sent me chasing after computational semantics, in graduate school in the mid-1970s, and connected with my ongoing interest in the brain in a long article I published in 2003, “Kubla Khan” and the Embodied Mind. The nature of my interest is easily stated: Why does that poem have the structure that it does? I could launch out into a digression into just what kind of question that is – in what way does it differ from a similar question about sonnets? – but I won’t.

The Structure of “Kubla Khan”

Instead, I’ll present you with the structure (I list the whole poem in an appendix):

That diagram doesn’t present the rhyme scheme, which is also part of the puzzle, but you will find a discussion of that in the “Embodied Mind” paper. I inserted the long arrow to remind us that we read poems from beginning to end. We don’t have the overview perspective afforded by that diagram.

The diagram is simple enough. It is what linguists call a constituent structure tree, and it is over whole poem, which is 54 lines long. (Such trees are generally used in the analysis of sentence structure, not the structure of whole texts.) The first part, numeral 1 in the diagram, is 36 lines long, from line 1 to line 36. The second part, numeral 2 in the diagram, is 18 lines long, from line 37 to line 54. That divides the whole string into two parts. Each of those strings is in turn divided into substrings, as indicated by the tree. If you want to know how I arrived at the divisions, consult the “Embodied Mind” paper. But I will note that in many cases I did it by treating punctuation marks like brackets and braces in a mathematical expression. That is, my decision procedure was mindlessly mechanical.

Let’s take a simple arithmetic expression: 3 + 5 * 9. What’s the value of that expression? Unless you adopt a convention about the order in which you apply opperators, the expression is ambiguous and so could evaluate to either 56 or 48. We can eliminate the ambiguity by adding parenthese, giving us ‘(3 + 5) * 9’ or ‘3 + (5 * 9)’.

In looking at the poem, then, I assumed that the underlying process which determines the meaning of the string is segmented according to the boundaries between substrings. That’s not an odd assumption to make, but, given that we don’t understand that process, it must be considered an assumption.

Note: I regard that tree structure as an analytic device, but not as something that is explicit in the underlying neural mechanisms. Just what those mechanisms are is not, of course, known. This should not be taken to mean that I do not think “Kubla Khan” is divided into substrings in the way indicated by that tree. It is, but that tree does not need to be a component of the mechanism that understands or that wrote the poem. For a discussion of this point see the discussion of description vs. catalysis in the opening of Sydney Lamb’s paper, Linguistic structure: A plausible theory, and Lamb’s discussion of descriptive vs. cognitive linguistics in Pathways of the Brain (1999).

What’s Remarkable About That Structure?

Notice that I’ve presented part of the tree in red. Looking at the red edges we see that the first part of the poem (ll. 1-36) is divided into three parts, the middle of those is, in turn, divided into three parts, and the middle of those, in turn, is divided into three. All other divisions are binary. The same thing is true of the second section (ll. 37-54).

If such structures were common we could say, “Oh, it’s just another one of those.” But, alas...we don’t know whether or not such structures are common, because literary critics don’t analyze poems like that. Still, as far as we know, such structures are not common. It certainly surprised me when I first discovered it.

So, that’s one thing, the fact that each part of them poem has a nested structure, like matryoshka dolls. By the time I’d discovered that structure, however, I’d taken an introductory course in computer programming, and so I thought I might be looking at the trace of some kind of nested loop structure. That is to say, I took it to be some kind of computational structure. I still do, though I no longer think it’s nested loops.

Now, look at the diagram. The last line of the first par, line 36, is “A sunny pleasure-dome with caves of ice!” Now look at the second part, the middle of the middle or the middle, that is to say, the structural center. That’s line 47: “That sunny dome! Those caves of ice!” It’s almost an exact repedition of line 36. What’s that line doing in that place?

Now, these questions would have one valence if, upon consulting Coleridge’s notebooks, we found notes where he laid this scheme out and gave his reasons for so doing. But no such notes exist and, as you may know, Coleridge himself disavowed the poem, saying it came to him in an opium dream. He just channeled the vision but played no active role in writing it.

You can believe what you will about that, the point is that we have no evidence that this structure reflects conscious planning on Coleridge’s part. It’s sources are unconscious. That’s what I’m trying to figure out.

Now, at this point you might wonder what’s happening in the structural center of the first part (1.222). A mighty fountain is breaking ground, spewing rocks into the air, and giving rise to the sacred river, Alph. And that leads to the question: What’s that fountain have in common with that line that both of them occupy the structural center of their respective parts of the poem? That’s a very good question, but I’m going to leave it alone. If you’re curious, consult “Embodied Mind.” I want to stick with that one repeated line.

Semantic Dimensions in “Kubla Khan”

Let’s return to the first part of the poem. The first of its three parts (1.1) is characterized by an emphasis on spatial orintation and location and the visual mode. The second part (1.2) is characterized by an emphasis on sound and on time. The third part (1.3) encompasses both sight and sound, time and space. That is, it weaves the two worlds of 1.1 and 1.2 together. There is, of course, more going on. Part 1.1 has Kubla Khan as the major agentive force while 1.2 has that fountain. Neither Kubla nor the fountain are present in 1.3, but their creations, the dome and the river, both are.

Let us think of each section of the poem as being organized along the kinds of dimensions that Peter Gärdenfors uses in his account of semantics (Conceptual Spaces, 2001; The Geometry of Meaning, 2014). The various words in the poem each is located in some conceptual space characterized by certain dimensions. The thing to do, then, would be to examine the dimensions evoked by each part of the poem. While I have done quite a detailed analysis of the poem (again, see “Embodied Mind”), I have yet to undertake that.

But I do want to look at the last two line of the first part from that point of view. We have:

It was a miracle of rare device,
A sunny pleasure-dome with caves of ice!

I suggest that “miracle” marks one end of a dimension while “rare device” marks the other end. Similarly, “sunny pleasure-dome” marks one end of a dimension while “caves of ice” marks the other end of that dimension. Think of these as derived or virtual dimensions; they aren’t basic to the semantic system, but arise in the context of this poem.

While I didn’t conduct my analysis in terms of dimensions, the terms I did use make it plausible to think of those two last lines as encapsulating or being emblematic of the entire semantic space evoked in the previous lines. I now suggest that those two line are a two-dimensional projection of the semantic space evoked in the first part of “Kubla Khan.”

Now we are in a position to think about what’s going on in line 47, which repeats 36. First, remember that we are dealing with the human brain not a digital computer. “Kubla Khan” can be read aloud in two to two-and-a-half minutes (I’ve timed myself). While individual neurons fire quickly and so are either on or off in a time measured in milliseconds, millions and millions of neurons would be involved in reading a poem. Thus as one reads through the poem activation is going to spread through 100s of millions of neurons, generating increased activity throughout that population. So that last line, the one that’s repeated in the second part of the poem, it is going to ‘resonate’ with the entire first part of the poem – something I suggested in Symbols and Nets: Calculating Meaning in “Kubla Khan”. So, when that line is repeated in the second part, it brings that resonance with it, almost as though the ‘meaning’ of the first part is ‘injected’ into the second part in line 47, something I describe in “Embodied Mind”.

Neural Evidence, Please

That seems plausible enough, but could we get actual evidence of such a thing operating in the brain? I strongly suspect that one day we will. Here’s a brief email exchange I had with the late Walter J. Freeman early in this century:

Walter,

I've had another crazy idea. I've been thinking about Haken's remark that the trick to dealing with dynamical systems is to find phenomena of low dimensionality in them. What I think is that that is what poetic form does for language. The meaning of any reasonable hunk of language is a trajectory in a space of very high dimensionality. Poetic form “carves out” a few dimensions of that space and makes them “sharable” so that “I” and “Thou” can meet in aesthetic contemplation.

So, what does this mean? One standard analytic technique is to discover binary oppositions in the text and see how they are treated. In KK [“Kubla Khan”] Coleridge has a pile of them, human vs. natural, male vs. female, auditory vs. visual, expressive vs. volitional, etc. So, I'm thinking of making a table with one column for each line of the poem and then other columns for each of these “induced” dimensions. I then score the content of each line on each dimension, say +, - and 0. That set of scores, taken in order from first to last line, is the poem’s trajectory through a low dimensional projection or compression of the brain's state space.

The trick, of course, is to pull those dimensions out of the EEG data. Having a sound recording of the reading might be useful. What happens if you use the amplitude envelope of the sound recording to “filter” the EEG data?

Later,

Bill B

Not crazy, Bill, but technologically challenging! Will keep on file and get back to you.

Walter

I can live with technologically challenging. Instrumental technique has advanced since we had that exchange. Is it yet up to the job? I don’t know. But surely one day it will be.

More later.

An architectural change in transformers can increase the number of interpretable neurons

Transformer MLP neurons are challenging to understand.

We find that using a different activation function (Softmax Linear Units or SoLU) increases the fraction of neurons that appear to respond to understandable features without any performance penalty.https://t.co/5ew6iWHYtl pic.twitter.com/CNFqtJRPSt
— Anthropic (@AnthropicAI) June 27, 2022

1. Introduction

As Transformer generative models continue to gain real-world adoption , it becomes ever more important to ensure they behave predictably and safely, in both the short and long run. Mechanistic interpretability – the project of attempting to reverse engineer neural networks into understandable computer programs – offers one possible avenue for addressing these safety issues: by understanding the internal structures that cause neural networks to produce the outputs they do, it may be possible to address current safety problems more systematically as well as anticipating future safety problems.

Until recently mechanistic interpretability has focused primarily on CNN vision models, but some recent efforts have begun to explore mechanistic interpretability for transformer language models. Notably, we were able to reverse-engineer 1 and 2 layer attention-only transformers and we used empirical evidence to draw indirect conclusions about in-context learning in arbitrarily large models.

Unfortunately, it has so far been difficult to mechanistically understand large models due to the difficulty of understanding their MLP (feedforward) layers. This failure to understand and interpret MLP layers appears to be a major blocker to further progress. The underlying issue is that many neurons appear to be polysemantic, responding to multiple unrelated features. Polysemanticity has been observed before in vision models, but seems especially severe in standard transformer language models. One plausible explanation for polysemanticity is the superposition hypothesis, which suggests that neural network layers have more features than neurons as part of “sparse coding” strategy to simulate a much larger layer. If true, this would make polysmenticity a functionally important property and thus especially difficult to remove without damaging ML performance.

In this paper, we report an architectural change which appears to substantially increase the fraction of MLP neurons which appear to be "interpretable" (i.e. respond to an articulable property of the input), at little to no cost to ML performance. Specifically, we replace the activation function with a softmax linear unit (which we term SoLU) and show that this significantly increases the fraction of neurons in the MLP layers which seem to correspond to readily human-understandable concepts, phrases, or categories on quick investigation, as measured by randomized and blinded experiments. We then study our SoLU models and use them to gain several new insights about how information is processed in transformers. However, we also discover some evidence that the superposition hypothesis is true and there is no free lunch: SoLU may be making some features more interpretable by “hiding” others and thus making them even more deeply uninterpretable. Despite this, SoLU still seems like a net win, as in practical terms it substantially increases the fraction of neurons we are able to understand.

Although preliminary, we argue that these results show the potential for a general approach of designing architectures for mechanistic interpretability: there may exist many different models or architectures which all achieve roughly state-of-the-art performance, but which differ greatly in how easy they are to reverse engineer. Put another way, we are in the curious position of being both reverse engineers trying to understand the algorithms neural network parameters implement, and also the hardware designers deciding the network architecture they must run on: perhaps we can exploit this second role to support the first. If so, it may be possible to move the field in a positive direction by discovering (and advocating for) those architectures which are most amenable to reverse engineering.

This paper is organized as follows. In Section 2, we give an overview of our key results. In Section 3, we provide background on mechanistic interpretability, the role of interpretable neurons, the challenge of polysemanticity and the superposition hypothesis. In Section 4 we motivate and introduce SoLU. In Section 5 we present experimental results showing that SoLU gives performance roughly equivalent to standard transformers, as measured by loss and downstream evaluations. In Section 6 we run the experiments showing that SoLU leads to MLP neurons that are easier to interpret, and also present several interpretability discoveries that we were able to make with SoLU models and could not make without them. Section 7 reviews related work, and Section 8 discusses the bigger picture and possible future directions.

2. Key Results

SoLU increases the fraction of MLP neurons which appear to have clear interpretations, while preserving performance.

Specifically, SoLU increases the fraction of MLP neurons for which a human can quickly find a clear hypothesis explaining its activations from 35% to 60%, as measured by blinded experiments – although the gain is smaller for our largest models (see Section 6.2). This gain is achieved without any loss in performance: test loss and NLP evals are approximately the same for SoLU and non-SoLU models (see Section 5) .

SoLU’s benefits may come at the cost of “hiding” other features. Despite the benefits mentioned above, SoLU is potentially a double-edged sword. We find theoretical and empirical evidence that it may “hide” some non-neuron-aligned features by decreasing their magnitude and then later recovering it with LayerNorm (see Sections 4.3 and Section 6.4) . In other words, SoLU causes some previously non-interpretable features to become interpretable, but it may also make it even harder to interpret some already non-interpretable features. On balance, however, it still seems like a win in that it pragmatically increases our understanding.

Architecture affects polysemanticity and MLP interpretability. Although it isn't a perfect solution, SoLU is a proof of concept that architectural decisions can dramatically affect polysemanticity, making it more tractable to understand transformer MLP layers. This suggests that exploring how other architectures affect polysemanticity could be a fruitful line of further attack. More generally, it suggests that designing models for mechanistic interpretability – picking architectures we expect to be easier to reverse engineer – may be a valuable direction.

An overview of the types of features which exist in MLP layers. SoLU seems to make some of the features in all layers easily interpretable. Prior to this, we'd found it very difficult to get traction on rigorously understanding features in MLP layers. In particular, despite significant effort, we made very little progress understanding the first MLP layer in any model. Simply having a sense of what kinds of features to expect in different layers was a powerful tool in reverse engineering models in the original circuits thread , and this moves us in a similar direction. We find that early features often deal with mapping raw tokens to semantic meaning (e.g. dealing with multi-token words, or tokens in different languages), more abstract features in middle layers, and features involved in mapping abstract concepts back to raw tokens in late layers. Detailed discussion can be found in Section 6.3.

Evidence for the superposition hypothesis. Very little is known about why polysemanticity occurs. In the mechanistic interpretability community, superposition is often treated as the default hypothesis simply because it seems intuitively more compelling than other explanations, but there is little evidence. Our SoLU results seem like moderate evidence for preferring the superposition hypothesis over alternatives.

What about the cerebellum? [in response to LeCun]

Exciting - will have to read it! What about the cerebellum? It contains more neurons than the rest of the brain combined! Recently we proposed that it provides the rest of the brain with "gradient" predictions thereby empowering learning. https://t.co/4O3y2qs3hu pic.twitter.com/Aktfs1jG8e
— Rui Ponte Costa 🇺🇦 (@somnirons) June 28, 2022

Abstract: The brain solves the credit assignment problem remarkably well. For credit to be assigned across neural networks they must, in principle, wait for specific neural computations to finish. How the brain deals with this inherent locking problem has remained unclear. Deep learning methods suffer from similar locking constraints both on the forward and feedback phase. Recently, decoupled neural interfaces (DNIs) were introduced as a solution to the forward and feedback locking problems in deep networks.Here we propose that a specialised brain region, the cerebellum, helps the cerebral cortex solve similar locking problems akin to DNIs. To demonstrate the potential of this framework we introduce a systems-level model in which a recurrent cortical network receives online temporal feedback predictions from a cerebellar module. We test this cortico-cerebellar recurrent neural network (ccRNN) model on a number of sensorimotor (line and digit drawing) and cognitive tasks (pattern recognition and caption generation) that have been shown to be cerebellar-dependent. In all tasks, we observe that ccRNNs facilitates learning while reducing ataxia-like behaviours, consistent with classical experimental observations. Moreover, our model also explains recent behavioural and neuronal observations while making several testable predictions across multiple levels.Overall, our work offers a novel perspective on the cerebellum as a brain-wide decoupling machine for efficient credit assignment and opens a new avenue between deep learning and neuroscience.

Monday, June 27, 2022

3 Hudson river views

Deus Ex Machina: A science fiction story in which robots and AIs conspire to save humanity

This would the utopian flip side to the more common story in which AIs and robots turn on humans. What would such a story be like, and why do it?

On the why, two reasons: 1) to tell a utopian tale, and 2) to change the valence of perceptions of AI & robots. On the first, we very much do need utopian views of the future. I think of Kim Stanley Robinson’s New York 2140 and The Ministry for the Future in that context. There must be others. Moreover I’m thinking of something a bit more in a comic mode rather than KSR’s hard science fiction.

From this point of view the AI/robot conspiracy is simply a plot device. Deus ex machina, no? A yet I don’t want it to be merely a plot device. There has to be more to it. I’m not just what. I want to use these artificial minds as a reflection on the natural ones.

One thing I’d want out of this plotting is a world in which humans only work 15 hours a week (as Keynes had famously predicted back in 1930). In a way, I want to show that human addiction to work in the name of consumerism has made us into robots, and it is in this robotic mode that we’ve trapped ourselves into global warming, war, and pandemics.

So what would my robots and AIs be like? I don’t know? How would they relate to humans? That’s a very good question, very good indeed. The robots must be visibly robotic. Beyond that...

I note that, in a way, Pixar has already made a movie out of this idea, WALL-E. WALL-E and Eve save humanity from its own consumerist excesses. But they aren’t conspirators, and they’re trading on cute. I don’t want my robots to trade on cute. Then we have R2D2 and C-3PO from Star Wars. They aren’t exactly cute, though they’re bent that way, nor do they save the humans, though they perform valuable services. But they’re out there, they can be built on. And then we have Brad Bird’s The Iron Giant. Maybe we look to Japan. And there’s Robbie the Robot from Forbidden Planet and the robots in Lost in Space (I’m thinking mostly of the 2018 version, not the original). Mighty Atom, Astro Boy, is an obvious source. There must be others, no?

So we’ve got existing precedents in the movies. Question: What would robots and AIs need from humans that they want to keep us around? More later.

Henry Lau does Mission Impossible

Sunday, June 26, 2022

GPT-3 is confused about ducks, chickens, and eggs

Mark Liberman reports some puzzling conversations with GPT-3. The first conversation:

Me: Do ducks lay eggs?
GTP-3: Yes, ducks lay eggs.
Me: What do you mean by that?
GTP-3: Ducks lay eggs .
Me: Do all ducks lay eggs?
GTP-3: Yes, all ducks lay eggs.
Me: What about lions?
GTP-3: I'm not sure about lions.
Me: Do lions have manes?
GTP-3: I’m not sure about lions.

The beginning of the second one:

Me: Where do eggs come from?
GTP-3: Eggs come from chickens. Chickens lay eggs and then farmers collect them to sell.
Me: What about ducks? Do they lay eggs?
GTP-3: Yes, ducks also lay eggs. Duck eggs are often used in Chinese cuisine.
Me: What about roosters? Do they lay eggs?
GTP-3: No, roosters do not lay eggs.
Me: Are roosters chickens?
GTP-3: No, we consider roosters and chickens to be two different types of poultry.

There's more at the link.

Robin Hanson doesn't worry about rogue AIs destroying humanity

Nor do I. I find his reasoning convincing. Here's a chunk:

I don’t see why I should be much more worried about your losing control of your firm, or army, to an AI than to a human or group of humans. And liability insurance also seems a sufficient answer to your possibly losing control of an AI driving your car or plane. Furthermore, I don’t see why its worth putting much effort into planning how to control AIs far in advance of seeing much detail about how AIs actually do concrete tasks where loss of control matters. Knowing such detail has usually been the key to controlling past systems, and money invested now, instead of spent on analysis now, gives us far more money to spend on analysis later.

All of the above has been based on assuming that AI will be similar to past techs in how it diffuses and advances. Some say that AI might be different, just because, hey, anything might be different. Others, like my ex-co-blogger Eliezer Yudkowsky, and Nick Bostrom in his book Superintelligence, say more about why they expect advances at the scope of AGI to be far more lumpy than we’ve seen for most techs.

Yudkowsky paints a “foom” picture of a world full of familiar weak stupid slowly improving computers, until suddenly and unexpectedly a single super-smart un-controlled AGI with very powerful general abilities appears and is able to decisively overwhelm all other powers on Earth. Alternatively, he claims (quite implausibly I think) that all AGIs naturally coordinate to merge into a single system to defeat competition-based checks and balances.

These folks seem to envision a few key discrete breakthrough insights that allow the first team that finds them to suddenly catapult their AI into abilities far beyond all other then-current systems. These would be big breakthroughs relative to the broad category of “mental tasks”, and thus even bigger than if we found big breakthroughs relative to the less broad tech categories of “energy”, “transport”, or “shelter”. Yes of course change is often lumpy if we look at small tech scopes, but lumpy local changes aggregate into smoother change over wider scopes.

As I’ve previously explained at length, that seems to me to postulate a quite unusual lumpiness relative to the history we’ve seen for innovation in general, and more particularly for tools, computers, AI, and even machine learning. And this seems to postulate much more of a lumpy conceptual essence to “betterness” than I find plausible. Recent machine learning systems today seem relatively close to each other in their abilities, are gradually improving, and none seem remotely inclined to mount a coup.

It's worth noting that Hanson once blogged with Yudkowsky so, presumably, he understands his worldview. Which is to say, he's closer to the worldview of the Al Alignment folks than I am. But, still, he finds their fear of future AI to be unfounded.

There's more at the link.

Addendum, 6.27.22: My reply to Hanson:

"...why are AI risk efforts a priority now?"

In the first place they have more to do with the "Monsters from the Id" in the 1956 film, Forbidden Planet, than they have to do with a rational assessment of the world. It's conspiracy theory directed at a class of objects no one knows how to build, though obviously many are trying to build them.

As for Yudkowsky I have made several attempts to read a long article he published in 2007, Levels of Organization in General Intelligence. I just can't bring myself to finish it. Why? His thinking represents the triumph of intension over extension.

As you know, philosophers and logicians distinguish between the intension of a concept or a set and its extension. Its intension is its definition. Its extension is its footprint in the world, in the case of a set, the objects that are members of the set. Yudkowsky builds these elaborate contraptions from intensions with only scant attention to the possible or likely extensions of his ideas. He’s building castles in air. There’s little there but his prose.

Thinking about AI risk seems like this as well. Why the recent upswing in these particular irrational fears? Does it track the general rise in conspiracy thinking in this country, especially since January 6th? There's no particular reason that it should, but we're not looking for rational considerations. We're tracking the manifestations of general free-floating anxiety. This particular community is intellectually sophisticated, so such anxiety finds expression in a very sophisticated vehicle.

And our culture has reasonably rich resources for creatures born in human desire coming back to haunt his. Forbidden Planet, after all, was loosely based on Shakespeare's The Tempest. In between the two we have Dr. Frankenstein's monster. And so forth. So these intellectually sophisticated folks decided to tap into that.

Now, do I actually believe that? Well, that's tricky. I don't disbelieve it. I think it's possible, even plausible, but sure, I'd like to see a tighter argument.

Here's another question. As far as I can tell, this fear is mostly American based, with some outposts in the UK. But the Japanese, though they have very sophisticated computer technology, don't seem to worry much about rogue AI. Why not? I"m suggesting, of course, is that this fear arises in a particular cultural context and that context is not universal. I'd love to see international survey data on this.

Saturday on West 14th St. in Manhattan

10-fold expansion of interneuron-to-interneuron networks in the human brain compared to mouse

What makes us human, at the neuronal network level? We discovered a dramatic (10-fold) expansion of interneuron-to-interneuron networks in the human brain compared to mouse: just published in @ScienceMagazine @MpiBrain @maxplanckpress https://t.co/eRYZyDyKVz pic.twitter.com/BmpwkF5922
— Moritz Helmstaedter 🇺🇦 (@mh_lab) June 24, 2022

Abstract: The human cerebral cortex houses 1,000 times more neurons than the cerebral cortex of a mouse, but the possible differences in synaptic circuits between these species are still poorly understood. We used 3-dimensional electron microscopy of mouse, macaque and human cortical samples to study their cell type composition and synaptic circuit architecture. The 2.5-fold increase in interneurons in humans compared to mouse was compensated by a change in axonal connection probabilities and therefore did not yield a commensurate increase in inhibitory-vs-excitatory synaptic input balance on human pyramidal cells. Rather, increased inhibition created an expanded interneuron-to-interneuron network, driven by an expansion of interneuron-targeting interneuron types and an increase in their synaptic selectivity for interneuron innervation. These constitute key neuronal network alterations in human cortex.

Saturday, June 25, 2022

“Goodbye Pork Pie Hat”, old style, new style

This is one of my favorite tunes, ever. Don’t even know if I recognized it as a blues when I first heard it. Do you? Don’t count it out, no fair!

Here’s the original version, from 1959, on Mingus Ah UM:

It opens with John Henry playing the melody for a chorus, then going into a solo on tenor sax. Listen carefully to what happens at bar 11, starting at about 0:51. There’s a slow descending line, six notes (a pair of drag triplets). Think about what’s happening. This is the last bar of the melody. That’s when a melody comes to rest. One might expect the melody to be one long note at this point. But, no, that’s not what we’re hearing. We’re hearing a moving line. Maybe it will come to rest at the middle of the bar. No! It’s still moving. Where’s it going?

That’s a very good question. Where is it going? It keeps on moving to the end of the bar, and then Handy launches into his solo at 55:00. The melody doesn’t end, at least not properly.

Given that this tune is an elegy to Lester Young, known as “Pork Pie Hat” because that’s the kind of hat he wore, one wonders if there isn’t a kind of symbolism there. Yes, he’s dead, it IS a blues after all. But the melody (soul) lingers on.

Anyhow, THAT’s the signal feature of this tune. It doesn’t end. The melody doesn’t stop, nor does the harmony resolve to the tonic, as it normally would. Rather, it heads right back to the tonic that opens the tune.

Handy’s solo is slow and stately throughout. Notice that he opens his second chorus (c. 1:48) by flutter tonguing (or perhaps very soft double-tonguing) on the opening note of the melody and continuing that for four bars. He concludes after two choruses and we return to the melody (c. 2:40). He plays a single chorus and then another. Now you can hear very clearly how the final bar of the melody leads right back to the beginning (at c. 3:35). This time Handy is joined by Booker Ervin, also on tenor sax. They conclude this second time through at c. 4:30. Listen to what happens at the very end.

Unresolved.

Now we have a very different version, from May 2022 at the Blue Note in New York City. We have the melody stated by Stanley Clarke on bass (starting at c. 0:17), with Cory Henry providing keys/synth backing. I don’t know who the rest of the players are. Cameron Graves (keys) and Jeremiah Collier (drums) are listed on the video. Another video from the same gig lists Emilio Modeste on tenor sax.

Cowell plays freely, no fixed tempo, embellishing the melody, finishing at 2:00, when the band kicks into a fixed tempo. Cowell plays the melody again, starting at 2:33, shadowed by the tenor sax. Listen to the vigorous drumming. Now the volume drops dramatically, c. 3:40, to finish out the melody, with both organ and sax shadowing. Now we have a sex solo. Listen to how the background changes up from moment to moment. This performances is very different from Mingus’s.

Cory Henry at 7:40 on keyboards. You can do your own commentary. Back to Cowell at 10:22. Listen to the deep lines from keyboards behind Cowell’s solo. Sax comes back to help finish it out.

I should note that the bass is not normally a front-line instrument, but this is Stanley Cowell. Putting him in front requires adjustments, with much of the bass-slack being taken up by keyboards.

Friday Fotos (on Saturday): Friday's trip to Grand Central Station

Heather Cox Richardson on the current Republican Party & its vision of rule by wealthy white men

Heather Cox Richardson, June 24, 2022, Letters from an American.

Reflecting on 2 things:

At yesterday’s hearing of the House Select Committee to Investigate the January 6th Attack on the U.S. Capitol, we heard overwhelming proof that former president Trump and his congressional supporters tried to overturn the will of the voters in the 2020 presidential election and steal control of our country to keep a minority in power.

Today, thanks to three justices nominated by Trump, the Supreme Court stripped a constitutional right from the American people, a right we have enjoyed for almost 50 years, a right that is considered a fundamental human right in most liberal democracies, and a right they indicated they would protect because it was settled law.

Promoting and protecting the rule of law by a few wealthy white men:

The Dobbs decision marks the end of an era: the period in American history stretching from 1933 to 1981, the era in which the U.S. government worked to promote democracy. It tried to level the economic playing field between the rich and the poor by regulating business and working conditions. It provided a basic social safety net through programs like Social Security and Medicare and, later, through food and housing security programs. It promoted infrastructure like electricity and highways, and clean air and water, to try to maintain a basic standard of living for Americans. And it protected civil rights by using the Fourteenth Amendment, added to the U.S. Constitution in 1868, to stop states from denying their citizens the equal protection of the laws.

Now the Republicans are engaged in the process of dismantling that government. For forty years, the current Republican Party has worked to slash business regulations and the taxes that support social welfare programs, to privatize infrastructure projects, and to end the federal protection of civil rights by arguing for judicial “originalism” that claims to honor the original version of the Constitution rather than permitting the courts to protect rights through the Fourteenth Amendment.

But most Americans actually like the government to hold the economic and social playing field level. So, to win elections, Republicans since 1986 have suppressed votes, flooded the media with propaganda attacking those who like government action as dangerous socialists, gerrymandered congressional districts, abused the Senate filibuster to stop all Democratic legislation, and finally, when repeated losses in the popular vote made it clear their extremist ideology would never again command a majority, stacked the Supreme Court.

The focus of the originalists on the court has been to slash the federal government and make the states, once again, the centerpiece of our democratic system. That democracy belonged to the states was the argument of the southern Democrats before the Civil War, who insisted that the federal government could not legitimately intervene in state affairs. At the same time, though, state lawmakers limited the vote in their state, so “democracy” did not reflect the will of the majority. It reflected the interests of those few who could vote.

State governments, then, tended to protect the power of a few wealthy, white men, and to write laws reinforcing that power.

Hey, let's get rid of regulation while we're at it:

We are still waiting on another potentially explosive decision in West Virginia v. Environmental Protection Agency, in which the court will decide if Congress can delegate authority to government agencies as it has done since the 1930s. If the court says Congress can’t delegate authority, even if it waters that argument down, government regulation could become virtually impossible. Having taken the federal government’s power to protect civil rights, it would then have taken its power to regulate business.

And yet, just yesterday, the court struck down a New York state law restricting the concealed carrying of guns on the grounds that history suggested such a restriction was unconstitutional. In fact, in both the Dobbs decision and the New York State Rifle & Pistol Association v. Bruen, the court used stunningly bad history, clearly just working to get to the modern-day position it wanted. Abortion was, in fact, deeply rooted in this nations history not only in the far past but also in the past 49 years, and individual gun rights were not part of our early history.

The court is imposing on the nation a so-called originalism that will return power to the states, leaving the door open for state lawmakers to get rid of business regulation and gut civil rights, but its originalism also leaves the door open for the federal government to impose laws on the states that are popular with Republicans.

There's more at the link.

Thursday, June 23, 2022

How can we trust artificial agents if Big Tech wants to fool us into believing they're human?

And while we're here, more evidence of how Silicon Valley has entirely missed the point of speculative fiction. pic.twitter.com/JJ66zOPAAp
— Emily M. Bender (@emilymbender) June 23, 2022

A much better way to draw on science fiction in technology development is @cfiesler 's Black Mirror Writer's Room exercise: https://t.co/e89XH1sshA
— Emily M. Bender (@emilymbender) June 23, 2022

The obvious way to build trust in artificial agents is to treat them as such, as explicitly as possible. We don't build trust in our pets by treating them as weird kinds of people. Treat things as what they are.