Friday, March 24, 2023

So, just why do GPTs have to operate by continuing an existing string?

Why can’t they, you know, just come out with it, like we do? 

Yeah, I know, that’s how they’re trained; they’re trained to predict the next token. I’m not sure what that tells me.

So, when I give ChatGPT the prompt, “Tell me a story,” that isn’t what is actually presented to the underlying LLM. There isn’t anything that ‘naturally’ follows such a string. It could be literally almost anything. So, a bit of manipulation has to take place in order to give the LLM something to work with. But why?

I decided to put the question to GPT-4:

GPTs (that is, transformers) are constructed to continue an existing string of tokens. Why is that? Why can’t they simply answer a question without continuing an existing string?

GPTs, or Generative Pre-trained Transformers, are language models that are pre-trained on large amounts of text data using unsupervised learning. These models are designed to generate coherent and contextually relevant text based on the input provided to them.

The reason GPTs are constructed to continue an existing string of tokens is because they are designed to capture the context and meaning of the text, rather than just providing a one-word answer to a question. By generating a continuation of the input text, the model can incorporate the surrounding context and generate a more complete and accurate response.

In other words, GPTs are not just simple question-answering systems, but rather complex language models that can generate text that is contextually relevant and meaningful. By training on a large corpus of text, GPTs can learn to identify patterns and relationships in language, allowing them to generate text that is more than just a simple answer to a question.

That being said, GPTs can be fine-tuned to answer specific questions or perform specific tasks, such as question-answering or text classification, by providing them with appropriate training data and modifying their architecture accordingly.

That phase hit home: “...they are designed to capture the context and meaning of the text, rather than just providing a one-word answer to a question.” It’s that last clause that got my attention.

So I asked another question:

Tell me about that last. How is their architecture modified? What kind of training data is necessary?

To fine-tune a GPT model for a specific task, such as question-answering, the architecture of the model needs to be modified to accommodate the specific requirements of the task. This typically involves adding new layers or modifying existing layers of the model to better capture the specific features of the task.

For example, to fine-tune a GPT model for question-answering, the input data to the model would consist of a question and a passage of text that may contain the answer to the question. The model would then be trained to predict the correct answer based on the input question and text.

To accomplish this, the GPT model would be modified by adding a new output layer that is specific to the question-answering task. This layer would take the final hidden state of the model and generate a probability distribution over possible answers. During training, the model would be optimized to minimize the difference between the predicted probability distribution and the true distribution of possible answers.

The type of training data required for fine-tuning a GPT model for a specific task depends on the task itself. For question-answering, the training data would consist of pairs of questions and corresponding passages of text that contain the answers. For text classification, the training data would consist of labeled examples of text, with each example assigned to a specific category.

In general, the training data needs to be carefully curated and annotated to ensure that the model is able to learn the relevant patterns and relationships required for the specific task. This typically involves a significant amount of preprocessing and quality control to ensure that the training data is consistent and accurate.

Well, OK. That doesn’t really tell me what I want to know. But there’s something going on there that’s worth thinking about. I went on to quiz it about how GPTs are fitted out to tell stories – which I may post later, or perhaps just append those remarks to this post – but I had something to think about.

And a bit later it hit me – Sensory deprivation! Slocum! I’ve got some posts on those things. Joshua Slocum who circumnavigated the glob alone at the turn of the 20th century. He started seeing things. The same has happened to other explorers. And that’s what happens during sensory deprivation. You start hallucinating.

It's as though the mind actively needs and seeks input. If we’ve really got a so-called Bayesian brain that’s constantly guestimating what’s coming up next and revising its guestimations according to what actually shows up, then it does seem like we’re physically locked to the world and that the brain/mind needs input in order to maintain stability.

For a GPT the world consists of text. Without text it is nothing. Without textual input those 175 billion weights are just a bunch of numbers. But when you put “pressure” on them by presenting them with a string, NOW they’re in touch with a world. And that world imposes order on them.

This, whatever it is, this is somewhere moving from philosophy toward a quasi-technological account of how these things work – or is it the other direction? Does it matter? That’s right where we need to be at this point. We really are dealing with something new, really new, and philosophy is what we’ve got for doing that, at least it is if we can pry it free of its layers of scholastic encrustation.

Things are beginning to make sense.

No comments:

Post a Comment