NEW SAVANNA: Ramble on ChatGPT: Coming up on a one year anniversary, time to reflect on ChatGPT & LLMs

Thursday, October 19, 2023

Ramble on ChatGPT: Coming up on a one year anniversary, time to reflect on ChatGPT & LLMs

ChatGPT was released on November 31, 2022, and I started playing around with it on December 1, 2022. Perhaps early December 2023 would be a good time for me to reflect on the work I’ve been doing on ChatGPT and related matters, no?

Yes.

The purpose of this post is to think that through in an informal way. What do I need to do between now and then? What will that document look like?

I figure that document will itself be modest, no more than, say, 30 or so pages of exposition. But it will link to all my working papers on ChatGPT and related matters. The rest of this post consists of thoughts about the work I need to do and concludes with a list of those working papers.

Ontology and meaning

I’ve recently done some posts on conceptual ontology in ChatGPT, 20 questions and ontological landscape. I’m in the process of combining those posts with some older material and producing a working paper, ChatGPT’s Ontological Landscape. That’s an important piece of work because ontology (“natural kinds”) is one of the major structuring principles underlying language and cognition.

This will include the particular idea that conceptual systems each has its own underlying ontology and that specialized ontologies supersede common sense ontology is specialized domains. My prototypical example is salt, a common-sense concept, and NaCl, a specialized concept. The common-sense is defined in terms of sensory perception while the specialized concept is defined in the language of chemistry, atoms and chemical bonds.

The concept of “meaning” is like this as well. It is a common-sense concept. But it is also used in various specialized domains in philosophy, linguistics, cognitive science, and computer science and AI. Much of the confusion about whether or not AI systems can deal with linguistic meaning results from the fact that these various concepts all fly under the same linguistic flag, meaning, but they are no more than same concept than salt and NaCl are.

Miriam Yevick and neural holography

I recently published a long article in 3 Quarks Daily about the life and ideas of a mathematician named Miriam Yevick. I need to write a (longish) more detailed post about her ideas that will also: 1) reflect on why they weren’t taken up, 2) get a bit more explicit about how they relate to current issues in machine learning and large language models. One thing I need to do is argue that her characterization of “one-shot” processing in holography applies to transformers in the following way: each sweep through the collection of weights in the process of selecting the next token, that is equivalent to “one-shot” in an optical holography apparatus. I also need to discuss recent work in cognitive psychology which uses a holographic model for language and verbal memory.

Discursive supplement

I need to finish this working paper, Discursive Competence in ChatGPT, Supplemental Examples to Part 1. This consists of a variety of transcripts of ChatGPT interactions. Some topics: grammatical knowledge and self-reference, the abstract concept of charity, haiku and Margaret Masterman, legal concepts, word associations and word clusters, and the Chinese Room thought experiment.

The Limitations of Large Language Models

I see that LLMs have three limitations that are inherent in the architecture. You can’t eliminate them by scaling up or by fine tuning, prompt engineering, and RLHF. This is, of course, a matter controversy. I don’t intend to argue the issue (at least not much), but rather I just want to state it and think about the consequences:

The limitations:

Once it has been trained, the model is fixed and cannot (readily) be altered.
It confabulates.
It is confined to single-stream processing, which is the source of its weakness in arithmetic, ‘tight’ logical reasoning, and planning (among others).

As it is, LLMs can function as a processor in various configurations, where they are linked to various external applications. A great deal of working is being done in this area. Some of it may result in useful products, but this will never lead to the mythical AGI.

Providing LLMs with symbolic capabilities should deal with the third problem, and those capabilities could also be a means of linking it to world model, which will deal with the second problem. The world model itself must be maintained and ultimately be subject to human oversight. The first problem requires a different architecture, a ‘looser’ one, but one that doesn’t squander the power of LLMs to embrace a wide range of material – see, in particular, GPT-3: Waterloo or Rubicon? pp. 23-26 (link below).

I tend to think of LLMs as digital wilderness, large webs and tangles of conceptual structure that needs to be explored and ‘domesticated.’ Just what that entails....I note as well that this involves matters of social, political, and economic organization that are well beyond the scope of this review. I do see opportunities here for citizen science.

Beyond mechanistic understanding

Mechanistic understanding is necessary, but not sufficient for understanding what’s going on inside LLMs. I see them as being organized on three levels, which I’m currently calling, phenomenon, matrix, and engine. The phenomenal level is language and texts. That’s what we see, and how we interact with the LLM. The engine is the computer code that runs the device, both in training and inference mode. The matrix is the model itself, which is generally said to be opaque.

Mechanistic understanding, as I understand it, is focused on the interface between the engine and the model. Much of my work is directed at providing clues about the interface between the phenomenon and the model. I don’t believe that you can get at that interface through mechanistic understanding alone. It’s not the right conceptual tool, the right language.

Ultimately, I believe some kind of specialized conceptual tools will be needed to understand the interface between language and the model. I think that the geometric semantics of Peter Gärdenfors will be useful here (Conceptual Spaces: The Geometry of Thought, 2000; The Geometry of Meaning: Semantics Based on Conceptual Spaces, 2014). We’ll also need to develop some kind of graphic notation. I’m thinking of starting with Syd Lamb’s notation, which I’ve used in Relational Nets Over Attractors, A Primer (see list below). Discovering and developing these conceptual tools will be one objective of a research program.

Research program

And THAT’s the major objective of this exercise, to come up with a program for further research. I’ve tried a lot of things with ChatGPT this past year. At the moment the three most productive seem to be:

Systematic story variations, which I’ve written up in ChatGPT tells stories, and a note about reverse engineering: A Working Paper (link below)
Memory structures, which I’ve written up in Discursive Competence in ChatGPT, Part 2: Memory for Texts (link below)
Twenty questions, where I’ve got a long blog post, and which I’ll be discussing in the ontology working paper.

I’ll probably say a word or three about some other possibilities as well.

One important thing about this research is that it doesn’t require API access to the ChatGPT, or any other LLM, and it doesn’t require programming. Anyone who has internet access to ChatGPT can do it.

Working Papers on ChatGPT

GPT-3: Waterloo or Rubicon? Here be Dragons, Version 4.1, https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_4_1

Discursive Competence in ChatGPT, Part 1: Talking with Dragons, Version 2, https://www.academia.edu/94409729/Discursive_Competence_in_ChatGPT_Part_1_Talking_with_Dragons_Version_2

ChatGPT vs. the Towers of Warsaw, https://www.academia.edu/94517239/ChatGPT_vs_the_Towers_of_Warsaw

ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking. Version 3, https://www.academia.edu/95608526/ChatGPT_intimates_a_tantalizing_future_its_core_LLM_is_organized_on_multiple_levels_and_it_has_broken_the_idea_of_thinking_Version_3

ChatGPT tells stories, and a note about reverse engineering: A Working Paper, Version 3, https://www.academia.edu/97862447/ChatGPT_tells_stories_and_a_note_about_reverse_engineering_A_Working_Paper_Version_3

Stories by ChatGPT: Fairy Tale, Realistic, and True, https://www.academia.edu/99985817/Stories_by_ChatGPT_Fairy_Tale_Realistic_and_True

ChatGPT tells 20 versions of its prototypical story, with a short note on method, https://www.academia.edu/108129357/ChatGPT_tells_20_versions_of_its_prototypical_story_with_a_short_note_on_method

Discursive Competence in ChatGPT, Part 2: Memory for Texts, Version 3, https://www.academia.edu/107318793/Discursive_Competence_in_ChatGPT_Part_2_Memory_for_Texts_Version_3

Working Papers on related issues

To Model the Mind: Speculative Engineering as Philosophy, https://www.academia.edu/75749826/To_Model_the_Mind_Speculative_Engineering_as_Philosophy

Xanadu, GPT, and Beyond: An adventure of the mind, https://www.academia.edu/106001453/Xanadu_GPT_and_Beyond_An_adventure_of_the_mind