Humans can decipher adversarial images! Our new work (out TODAY in @NatureComms) shows that people can do "theory of mind" on machines—predicting how machines will see the bizarre images that "fool" them.— Chaz Firestone (@chazfirestone) March 22, 2019
Full data & code: https://t.co/3qYTvxp7kE pic.twitter.com/4gohIv3pBX
Here's the article's abstract:
Does the human mind resemble the machine-learning systems that mirror its performance? Convolutional neural networks (CNNs) have achieved human-level benchmarks in classifying novel images. These advances support technologies such as autonomous vehicles and machine diagnosis; but beyond this, they serve as candidate models for human vision itself. However, unlike humans, CNNs are “fooled” by adversarial examples—nonsense patterns that machines recognize as familiar objects, or seemingly irrelevant image perturbations that nevertheless alter the machine’s classification. Such bizarre behaviors challenge the promise of these new advances; but do human and machine judgments fundamentally diverge? Here, we show that human and machine classification of adversarial images are robustly related: In 8 experiments on 5 prominent and diverse adversarial imagesets, human subjects correctly anticipated the machine’s preferred label over relevant foils—even for images described as “totally unrecognizable to human eyes”. Human intuition may be a surprisingly reliable guide to machine (mis)classification—with consequences for minds and machines alike.
This is a fascinating and, I believe, important result.
From the tweet stream
We show that this is indeed the case! We showed human subjects images from many adversarial attacks, and made them guess how machines classified them — a "machine theory-of-mind" task. We found that, more often than not, humans can figure out how machines will see these images!— Chaz Firestone (@chazfirestone) March 22, 2019