The Hydra Effect: Emergent Self-repair in Language Model Computations
— AK (@_akhaliq) August 1, 2023
paper page: https://t.co/e8oycGaZCv
investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of… pic.twitter.com/0X92Jo0Bmp
No comments:
Post a Comment