The Hydra Effect: Emergent Self-repair in Language Model Computations
— AK (@_akhaliq) August 1, 2023
paper page: https://t.co/e8oycGaZCv
investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of… pic.twitter.com/0X92Jo0Bmp
Pages in this blog
Wednesday, August 2, 2023
Emergent Self-repair in Language Model Computations
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment