1/Is scale all you need for AGI?(unlikely).But our new paper "Beyond neural scaling laws:beating power law scaling via data pruning" shows how to achieve much superior exponential decay of error with dataset size rather than slow power law neural scaling https://t.co/Vn62UJXGTd pic.twitter.com/vVt4xDBcr7
— Surya Ganguli (@SuryaGanguli) June 30, 2022
Later in the stream:
6/ Overall this work suggests that our current ML practice of collecting large amounts of random data is highly inefficient, leading to huge redundancy in the data, which we show mathematically is the origin of very slow, unsustainable power law scaling of error with dataset size pic.twitter.com/2aNv0ssb9S
— Surya Ganguli (@SuryaGanguli) June 30, 2022
8/ Indeed, the initial computational cost of creating a foundation dataset through data pruning can be amortized across efficiency gains in training
— Surya Ganguli (@SuryaGanguli) June 30, 2022
many downstream models, just as the initial cost of training foundation models is amortized across faster fine-tuning on many tasks
No comments:
Post a Comment