NEW SAVANNA: I'll say it again, "predict the next token" is a reductive and misleading way of thinking about LLMs

Monday, October 23, 2023

I'll say it again, "predict the next token" is a reductive and misleading way of thinking about LLMs

This is an important point.
While all the common *sampling* strategies only choose 1 token at a time,
attention-layer training does *not* propagate gradients *backward* 1 token at a time,
meaning that some intermediate-layer features probably model aspects of much later tokens. https://t.co/b0nw0RLf3f
— davidad 🎇 (@davidad) October 19, 2023

And shame shame shame on the experts for allowing, encouraging, instructing so many to think that this is how they work.

This technology is too important to be left in the hands of these experts. They may be expert in programming the engines and "training" them, but that's as far as their expertise goes. They need to rethink their "understanding," if you can call it that, of how they function to produce text.