This is the first of three videos from 3Blue1Brown about how transformers work. The other two will be released in the coming weeks, though if you’re a member of the Patreon channel you can review and comment on a draft of the second video in the series.
Timestamps
0:00 - Predict, sample, repeat
3:03 - Inside a transformer
6:36 - Chapter layout
7:20 - The premise of Deep Learning
12:27 - Word embeddings
18:25 - Embeddings beyond words
20:22 - Unembedding
22:22 - Softmax with temperature
26:03 - Up next
In an ideal world, a video like this would have been released much earlier, no later than the release of ChatGPT to the general public on November 30, 2022. That way those curious about how these creatures work, but not (quite) having the intellectual skills required to read the technical literature, would have had something to work with. That wouldn’t, however, have prevented the “stochastic parrots” nonsense, that’s more an ideological judgment as an intellectual. But it might have helped to lower the level of confusion attendant upon the release of ChatGPT.
As it is, it’s taken 488 days for this video to become available, and the series is not yet complete. Take that lag as an index of the problems we’re having adjusting to the implications of this technology. The lag isn’t anyone’s fault, not OpenAI’s, not 3Blue1Brown’s, no one’s. It’s just a property of the techno-socio-economic-cultural system in which we live.
No comments:
Post a Comment