2/n We tested 46 tasks across four domains, grounded in brain networks studied in humans: language (supported by the language network), formal reasoning (Multiple-Demand network), physical reasoning (Intuitive Physics network), and social reasoning (Theory of Mind network). pic.twitter.com/Jdh8n1x3zy
— Pengrui Han (Barry) (@pengrui_han) June 30, 2026
4/n To find the units to support these tasks, we use attribution patching. The analysis runs in four steps. (1) We start from each task's minimal pairs. (2) We pass both inputs through the model and record activations at every neuron. (3) We then score each neuron, combining how… pic.twitter.com/FEq7vmAO13
— Pengrui Han (Barry) (@pengrui_han) June 30, 2026
No comments:
Post a Comment