Some random musings on how the whole "AI scaling" debate. It is clear that neural nets can model any predictive distribution p(x_future|x_past) given enough parameters and data, since they are universal approximators;
— Kevin Patrick Murphy (@sirbayes) June 15, 2022
He continues:
so in that trivial sense "scale is all you need". But this will not be efficient in handling the combinatorial explosion of "edge cases" for which we do not have data (eg on the internet) that can simply be memorized.
BB Note: Come to think of it, one might argue that Kuhn's account of scientific revolutions is that one spots catalytic "edge cases" (the one's that Kuhn calls anomalies) and uses them to leverage a new paradigm into being.
To perform "strong generalization" - ie make reliable predictions under interventions and changing distributions - you have to learn (some approximation to) the underlying data generating mechanism in latent space, not just the induced marginals in visible space.
BB: By "underlying data generating mechanism" I assume he means us human beings, for we generated the texts in the corpus on which a given LLM is trained. And we definitely use symbolic means, though, as I have asserted, symbolic means ultimately grounded in Geoffrey Hinton's "big vectors of neural activity."
So while we can in principle learn everything just by optimizing predictions p(x), I claim it will be much more efficient (in terms of data and compute) to optimize over the space of plausible models of the world p(x|z).
Given a model with latent variables, we can make predictions in observation space, but we can also do counterfactual reasoning, and can come up with compressed and meaningful explanations of observed data (eg do scientific discovery).
So I disagree that "scale is all you need". Instead we also need models and assumptions about data generating mechanisms - but these need to be checked against reality by performing experiments.
End of rant :)
No comments:
Post a Comment