We hope this will eventually enable us to diagnose failure modes, design fixes, and certify that models are safe for adoption by enterprises and society. It's much easier to tell if something is safe if you can understand how it works!
— Anthropic (@AnthropicAI) October 5, 2023
Last year, we conjectured that polysemanticity is caused by "superposition" – models compressing many rare concepts into a small number of neurons. We also conjectured that "dictionary learning" might be able to undo superposition.https://t.co/bgJdScRcay
— Anthropic (@AnthropicAI) October 5, 2023
There are more tweets in the thread. Check it out.
No comments:
Post a Comment