My own view of this having spent some time in visual neuroscience is that if you really want vision that is robust to these kinds of issues then you have to build a geometric representation of the world first, and then learn/map categories from that. Trying to jump from a matrix to a label without having an intervening topological/geometric model of the world in between (having 2 eyes and/or the ability to move and help with this) is asking for trouble because we think we are recapitulating biology when in fact we are doing nothing of the sort (as these adversarial examples reveal beautifully).
> we think we are recapitulating biology when in fact we are doing nothing of the sort (as these adversarial examples reveal beautifully).
I'm not sure I'd go so far. There's a pretty long list of optical illusions. Seeing motion where there clearly is none, not comparing distances correctly and most relevant here is things that look like a face. Here are a few selected famous examples: http://brainden.com/face-illusions.htm
Some of those immediately make my brain flag up "FACE". It's only looking in more detail that I see what else is there, but my visual system is clearly being tricked, as would billions of other completely independently grown visual systems. How much better could we do this, and with more subtlety, if we could analyse the whole brain like we can neural networks and target a specific brain?
There's an old experiment showing how a kitten raised never seeing horizontal lines will fail to see them ever after a certain age, so we know that biological systems struggle with limited visual input.
I'd also say we're doing matrix -> label conversions ourselves, too, unless we're born with a special geometric model. Deep learning also does things in layers, so there's not a direct matrix-label learning happening straight away, that should come much later after the system has learned to create a higher level representation of the input.
On a less contrarian side, I wonder how well these things would work if we were to show the networks videos of... everything. Years and years of video. Don't try and add labels yet, but can we add a constraint that we expect the representation to only change slowly? Two very similar frames should not result in the high-level interpretation changing drastically.
I think we should refrain from saying we are recapitulating biology until we have reached the point where the machine systems tend to succeed AND fail in the SAME ways that the biological systems do.
We tried that; the reason deep nets are popular is that they outperform geometric (or other problem-specific) models.
This might be because they implicitly develop such representations somewhere along the way, or because such a representation is not really necessary for visual classification.
Additionally, introducing ancillary modules is not without cost-- you might gain robustness to some kinds of adversarial inputs at the expense of becoming vulnerable to others.
There's plenty of ways to fool biological visual systems: c.f. magic-eye posters, optical illusions, or the various exploits described in Lettvin and Pitts' paper "What the Frog's Eye Tells the Frog's Brain".
I think that remains to be seen, at least in the general case, since we haven't yet agreed on a measure of performance. The debate around adversarial examples can be interpreted as arguing over the proper measure of performance. Although so far the debate is doing so somewhat implicitly, since afaik nobody has formalized a measure of robustness to adversarial examples; it's progressed more by case studies (which is fine, since research into NN robustness is still quite early stage, and case studies can help illustrate issues). I think it can be fairly said that neural nets perform well on the ImageNet benchmark and similar measures of performance. But whether those are good measures of performance, or whether some kind of metric that weights robustness more heavily should be used (and what methods would perform well on that) is the subject of current research, like this research.
Is it though? The human correctly interpreted the image. The problem is that the image was well, not "real". Human have a limit of figuring out what is real and not real based on experience.
I think the point is that these models are often hyped as being proof that we've reproduced human visual systems, and adversarial examples that humans can still resolve are evidence against that.
When the adversarial examples for humans MATCH the adversarial examples for image classifiers, that would be evidence of having reproduced a biological system.
IMO, what these adversarial examples give us is a way to boost training data. We should augment training datasets with adversarial examples, or use adversarial training methods. The resulting networks would only be more robust as a result.
As for self-driving cars, this is a good argument for having multiple sensing modalities in addition to visual, such as radar/lidar/sonar, and multiple cameras, infrared in addition to visible light.
But at what point do you have to wonder if we're using the wrong basis? And how do you know that augmenting the data with tiny adversarial perturbations won't just leave the network vulnerable in a different direction?
It's pretty obvious how to build translational symmetry into a net that's still expressive and easy to train (convolution). But you have to spoon feed CNNs rotational and other symmetries by augmenting the training data. What you really want is a model that has all the symmetries your data has built in.
My sense is that the community at large seems to regard DL as a magic blackbox which it really is not. Complete basis of function + finite data = guarantee of wonky interpolation between samples. What you really need to do is restrict the class of expressible functions to those you need - build your prior into the model.
This is a huge topic in applying ML in physics and chemistry where we already have a lot of prior detailed knwoledge about the systems we want to describe and it would be silly not to build it into the ML models.
People now try to use ML anywhere and everywhere so it's wild west a little. Three examples: [1] uses a standard neural net to represent a many-body wave function, with all the machinery of quantum mechanics on top of that, and reinforcement learning to find the true ground state. [2] uses a handcrafted neural net, which by construction already takes advantage of a lot of prior knowledge, to directly predict molecular energies. [3] uses a simple kernel ridge regression coupled with a sophisticated handcrafted scheme to automatically construct a good basis (set of features) for a given input, to predict molecular energies.
In all these cases, the ML itself is not the target problem, but only a tool, and most effort goes into figuring out where exactly to use ML as a part of a larger problem, and how to encode prior knowledge, either via feature construction or neural net handcrafting.
You are, unfortunately, probably just playing out Mr. Crab's obsession with record players.
Remember that these tricky images are based on the principle that machine-learning algorithms are differentiable and high-dimensional. There is a lot of ways to transition between, say, the desktop dimension and the cat dimension, and it's all continuous, so we're guaranteed to be able to influence the machine in that sort of direction.
You could imagine somehow taking all of the adversarial examples and categorically augmenting a machine's learning to know about the examples, creating a cat-masquerading-as-desktop dimension. But all you've done is make a lot more space (by adding a dimension) and so the next iteration of adversarial examples will be able to proceed by the same process as before, just on this new augmented machine.
But we don't really care about the cat-masquerading-as-desktop category in itself, so an adversarial example that makes a cat look like a cat-masquerading-as-desktop, or masquerades a cat-masquerading-as-desktop as a cat, isn't really relevant.
By adding enough adversarial examples to the training set, you can absolutely immunize a model against adversarial perturbations of the training data.
The problem is that the volume of "not very different" data points surrounding an example grows exponentially with the input dimension, so you need to train for much longer, and your "adversarial protection" will likely overfit to the neighborhood of training examples, which doesn't help with unseen data.
We care about the existence of a nontrivial set of images that demonstrate a troubling lack of robustness in image classifiers, at least until we we have good reason to say with confidence that such failures will not be a problem in practice.
I can paint a road to a tunnel on a mountain side and fool some amount of people. Meep. Meep.
The problem isn't that there are adversarial inputs. The problem is that the adversarial inputs aren't also adversarial (or detectable) to the human visual system.
I am unsure what you mean-- do you mean with different training sets but the same testing set?
It's an interesting question; maybe the reason for (some) of these adversarial vulnerabilities is due to a handful of bad training examples.
You could formulate it as a search problem to see if there's particular images (or small groups of images) that are responsible for the adversarial vulnerabilities.
This might then indicate that some of these perturbations are really just taking advantage of the fact that neural nets tend to "memorize" some of the data, so we're not really exploiting some deep structural feature so much as just feeding the echo of an input that the net has learned to automatically classify as, say, a computer/desk[0].
It would be a good project, but I don't have enough GPUs on hand to train scores of deep nets from scratch.
Assuming one were to bite the bullet, it might also be worth trying different data augmentation strategies.
Most of the time, we try to eke out additional performance/robustness by using the same sets of transformations (translation, rotation, cropping, rescaling, etc.), but if the net is vulnerable to adversarial examples because of something in the training set, then you might just be making sure that adversarial vulnerability is present everywhere in the image and at multiple scales.
On a related note, there's an interesting paper about universal adversarial perturbations, i.e. those that can be added to any image and thereby induce a misclassification with high probability[1].
This effect holds even across different models, so the same perturbation can cause a misclassification in different architectures.
------
0. Neural nets learn by some combination of abstraction and memorization.
If, for some reason, many members of a particular class are hard to generalize, then it's possible that they instead learn to identify some particular aspects of those classes (that are not usually present in other images) and have a disproportionate response when those features are present.
If such features are not obvious to human visual inspection, then we get misclassifications without insight into why they were misclassified.
> I am unsure what you mean-- do you mean with different training sets but the same testing set?
Yes, assuming we have 10000 different training images. Divide these into 5 sets of 2000 each and train 5 networks with them. Assuming that 2000 images are plenty for this application, we will have 5 well trained networks that have similar performance for a test set.
BUT
They will work slightly differently internally and those "inverse gradient search" methods (or what they are called) might only be able to manipulate an image for one network at the time with "specifically chosen additive noise" while the other 4 are unimpressed.
That's assuming that the manipulation can't be targeted at all 5 classifiers at the same time.
It's not clear to me how malicious actors can manipulate this observation to confuse self-driving cars. That said, I don't think this discredits the point of the article; it's important to note how easily deep learning models can be fooled if you understand the math behind them. I just think the example of tricking self-driving cars is difficult to relate with / understand.
Why do you say that? The first demo they provide shows that the adversarial image, when printed and then manipulated, still fools the algorithm. That means that the example is robust to various affine transformations but also to the per-pixel noise that is a result of a printing something and then viewing it again through a camera.
Suppose you were to place an example like that on a stop sign that fooled a car into thinking that it was a tree. The car might blow through an intersection at speed as a result.
The training strategy they used provides a template for doing even more exotic manipulations. For example, you could train an adversarial example that looked like one thing when viewed from far away but something quite different up close. Placing an image like that by a road could result in an acute, unexpected change in the car's behavior (e.g. veering sharply to avoid a "person" that suddenly appeared).
Only if the adversarial image printed doesn't look like the stop sign, though the example in this article shows that it's entirely possible to make an image that just looks like a distorted/badly-printed kitten to a human but completely different to a computer. A similar image for a stop sign might just look like wear in the paint or weird reflections or something but still look like a stop sign to a human.
You could wear special adversarial clothing for example, or even just project adversarial images onto pavement, walls, poles, road signs, and other reflective surfaces.
Why limit yourself to self driving cars? A smart malicious actor would just throw oil out their window on a highway. Watch all the cars crash!!
I think these adversarial examples are near irrelevant issues for self driving cars. If someone does something bad, we prosecute them. Its the same whether you're throwing oil onto a highway, covering up stop signs with adversarial stop signs, or whatever you might want to do.
Now if there was an exploit that caused all self driving cars in the whole country to suddenly crash into walls, that would be one thing. But these image-based attacks are limited to a single intersection or road at time. And after a single car crashes, the intersection gets closed. So if you really want to kill a few people, why not just go and stab them in the neck?
That's another fundamental problem in itself: your car doesn't have much reasoning ability or knowledge of the world, so it can't tell if it's a flying plastic bag or a large boulder.
I doubt self driving cars would rely on a single network on a single image source (I hope not!).
Robust systems expect that some of the inferences can be mistaken (noisy). That's why you want to run multiple sensor types into different models, and use some kind of mixture of experts +/- probabilistic fusion.
It doesn't matter how many algorithms or sensors are consulted or combined to form judgment. If an attacker can obtain a self driving vehicle's hardware, and if enough tests can be performed per seconds, the attacker can train images that fool it.
Your idea is similar to an appeal to security through obscurity. Might work sometimes, but not generally.
(Noise does not help, because you can still discover a gradient to descend by averaging repeated trials.)
Replace 'images' with 'sensor data' and adversarial examples can still be generated. They might not be as easy to feed into the vehicles hardware (e.g. requiring speakers to fool an acoustic sensor), but the same principles apply.
It's also not necessary for the recognition algorithms to be using gradient descent, so long as they are differentiable (or can be approximated by a model that is), you can use gradient descent to find adversarial examples.
Adversarial examples exist for any model with a high input dimension (in relation to the available training data), differentiability only helps with finding them.
That's really interesting though, you might be able to combine something like Random Forest with Neural Nets to make them more robust to adversarial images.