NEW SAVANNA: Polysemantic "neurons" in LLMs

Thursday, October 5, 2023

Polysemantic "neurons" in LLMs

We hope this will eventually enable us to diagnose failure modes, design fixes, and certify that models are safe for adoption by enterprises and society. It's much easier to tell if something is safe if you can understand how it works!
— Anthropic (@AnthropicAI) October 5, 2023

Last year, we conjectured that polysemanticity is caused by "superposition" – models compressing many rare concepts into a small number of neurons. We also conjectured that "dictionary learning" might be able to undo superposition.https://t.co/bgJdScRcay
— Anthropic (@AnthropicAI) October 5, 2023

There are more tweets in the thread. Check it out.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)