First of all, the basics: random distributions can either be short-tailed (finite variance such as Gaussian) or fat tailed (e.g. Pareto). The nice property of finite variance distributions is that we can learn almost everything about the distribution by sampling it.
— Filip Piekniewski (@filippie509) August 16, 2022
We see that even for high variance gaussian the mean converges to steady state pretty fast and the variations subside. With Pareto it is much slower. For more extreme fat tailed distributions mean will not converge ever. This is a very well known fact.
— Filip Piekniewski (@filippie509) August 16, 2022
The thread continues:
In essence it means our ability to learn about the distribution from sampling is extremely limited if not fundamentally impossible. Now here we are talking about simple 1-dimensional distributions.
Now take that into very high dimension and think e.g. about a set of all images that represent a "corner case" for a self driving car. It is not inconceivable that such distribution would be fat-tailed (and messed up by the curse of dimensionality). Let's assume that is the case.
We then must conclude that sampling is not going to be any good for estimating what a typical "corner case" looks like, or even what "co-variance" structure is present. Both high dimension and lack of very compact support work in tandem to defeat sampling.
Therefore teaching a neural net on a set of known "corner cases" is doomed to fail because the next corner case is almost guaranteed to be nothing like the ones already sampled, independent of the size of the sample. This is the very foundation of the entire Tesla FSD program. Filip Piekniewski
So your typical "just need more data" mantra is dead on arrival, just like collecting more samples for e.g. Cauchy won't ever give you a stable estimate of mean, which simply does not exist. If that is the case though, why can humans deal with such complex distributions?
Humans don't approach data in purely statistical fashion, but rather realize the data is being generated by a complex dynamical system. Although you can't simply defeat that system with statistical sampling, but you can infer things about how it generates data. Filip Piekniewski
In other words, corner cases may all be different but they are generated by common mechanisms, and by studying and "understanding" these mechanisms we stand a chance of identifying and dealing with corner cases with all their instance specific complexity. Filip Piekniewski
Our AI currently is barely scratching the surface of such system characterization. Generally self-supervised state-predicting networks are probably the right direction, but certainly many aspects of that process are still not well understood. Nevertheless, if the
assumption put forward in the thread is true, it has devastating consequences for the entire "self-driving" industry and "autonomous-robot" in general. And everything we've seen watching years of struggle in this space seems to support the fat-tailed case.
No comments:
Post a Comment