Wednesday, August 17, 2022

Some statistical problems with AI [self-driving cars]

The thread continues:

In essence it means our ability to learn about the distribution from sampling is extremely limited if not fundamentally impossible. Now here we are talking about simple 1-dimensional distributions.

Now take that into very high dimension and think e.g. about a set of all images that represent a "corner case" for a self driving car. It is not inconceivable that such distribution would be fat-tailed (and messed up by the curse of dimensionality). Let's assume that is the case.

We then must conclude that sampling is not going to be any good for estimating what a typical "corner case" looks like, or even what "co-variance" structure is present. Both high dimension and lack of very compact support work in tandem to defeat sampling.

Therefore teaching a neural net on a set of known "corner cases" is doomed to fail because the next corner case is almost guaranteed to be nothing like the ones already sampled, independent of the size of the sample. This is the very foundation of the entire Tesla FSD program. Filip Piekniewski

So your typical "just need more data" mantra is dead on arrival, just like collecting more samples for e.g. Cauchy won't ever give you a stable estimate of mean, which simply does not exist. If that is the case though, why can humans deal with such complex distributions?

Humans don't approach data in purely statistical fashion, but rather realize the data is being generated by a complex dynamical system. Although you can't simply defeat that system with statistical sampling, but you can infer things about how it generates data. Filip Piekniewski

In other words, corner cases may all be different but they are generated by common mechanisms, and by studying and "understanding" these mechanisms we stand a chance of identifying and dealing with corner cases with all their instance specific complexity. Filip Piekniewski

Our AI currently is barely scratching the surface of such system characterization. Generally self-supervised state-predicting networks are probably the right direction, but certainly many aspects of that process are still not well understood. Nevertheless, if the

assumption put forward in the thread is true, it has devastating consequences for the entire "self-driving" industry and "autonomous-robot" in general. And everything we've seen watching years of struggle in this space seems to support the fat-tailed case.

No comments:

Post a Comment