Couldn’t you just use the generation of adversarial examples to bypass automatic content filters?

Note: I in no way condone doing this, it’s just food for thought.

I was at a friend’s house the past week. He wanted to watch Blizzcon, but did not have a ticket. Being a naughty boy, he attempted to find an illegal stream. This proved to be quite difficult to do, as a lot of automatic video identification is used; in the end he settled on an audio stream with a random set image displayed. I recall in the past that people would mirror images, add frames, and all sorts to bypass piracy detection on websites such as Youtube. So, why not take the arms race up a notch?

For those of you that don’t have a background in the technology, adversarial examples are images (or other media) which are designed in such a way as to cause a discriminator to misclassify the input, such as an image of a cat being identified as a paper towel. I won’t go in to a deep explanation of the technology here, but the vulnerability stems from how neural networks process images. If you know roughly how the numbers are crunched from the initial image (an array of pixel values) to generate a classification confidence value (0.00 to 1.00), you can back-propagate with a target confidence of 1.00 to work out the “essence” of a subject- what pixel values would cause the system to identify that specific object. Once you have these sets of pixel values, you can add or subtract them from your image to cause the system to either misclassify- identify the object as something else specifically, or just to classify incorrectly in a nonspecific way.

So, why don’t we just add these to video feeds to bypass video detection? I think that there would be two approaches. One would be a frame by frame calculation of backpropagation, subtracting these values for every frame to hide the contents from the automated systems. This would be highly computationally complex, and would not be suitable for live streamed media. The other would be to pretend that you are streaming audio with an occasionally switching random image- if you could apply that general pixel map of a cat over the video with enough intensity, the system might just think it’s a cat and ignore the lower probabilities of the pirated media. Potentially you could also use a similar method with audio, to avoid detection on that front.

Doing this would be fairly technically simple I believe- just make up some pixel maps yourself, and use an off the shelf video watermarking software to add them to the video prior to upload, or just have a low transparency overlay on any livestream. Perhaps this is already happening and I’m just not aware.

Again, please don’t do this and blame me. That would make me sad 🙁

Couldn’t you just use the generation of adversarial examples to bypass automatic content filters?

Be First to Comment

Leave a Reply Cancel reply