The transformer expressiveness results are often a bit of a red herring as there tends to be a huge gap between what can be expressed in transformers, and what can be learned with gradient descent. Mind the Gap, a new paper lead by @SaldytLucas dives deeper into this issue 👇👇 https://t.co/Q3BTFR5v7T
— Subbarao Kambhampati (à°•ంà°ంà°ªాà°Ÿి à°¸ుà°¬్à°¬ాà°°ాà°µు) (@rao2z) May 30, 2025
Saturday, May 31, 2025
Deep Learning doesn't Learn Deeply
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment