Adversarial machine learning

[1] A survey from May 2020 revealed practitioners' common feeling for better protection of machine learning systems in industrial applications.

[2] Most machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statistical distribution (IID).

(Around 2007, some spammers added random noise to fuzz words within "image spam" in order to defeat OCR-based filters.)

As late as 2013 many researchers continued to hope that non-linear classifiers (such as support vector machines and neural networks) might be robust to adversaries, until Battista Biggio and others demonstrated the first gradient-based attacks on such machine-learning models (2012[8]–2013[9]).

[10][11] Recently, it was observed that adversarial attacks are harder to produce in the practical world due to the different environmental constraints that cancel out the effect of noise.

[15] While adversarial machine learning continues to be heavily rooted in academia, large tech companies such as Google, Microsoft, and IBM have begun curating documentation and open source code bases to allow others to concretely assess the robustness of machine learning models and minimize the risk of adversarial attacks.

[16][17][18] Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of "bad" words or the insertion of "good" words;[19][20] attacks in computer security, such as obfuscating malware code within network packets or modifying the characteristics of a network flow to mislead intrusion detection;[21][22] attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user;[23] or to compromise users' template galleries that adapt to updated traits over time.

[30][31] Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche industry of "stealth streetwear".

[37][38] Attacks against (supervised) machine learning algorithms have been categorized along three primary axes:[39] influence on the classifier, the security violation and their specificity.

Poisoning consists of contaminating the training dataset with data designed to increase errors in the output.

[2] On social medias, disinformation campaigns attempt to bias recommendation and moderation algorithms, to push certain content over others.

A particular case of data poisoning is the backdoor attack,[46] which aims to teach a specific behavior for inputs with a given trigger, e.g. a small defect on images, sounds, videos or texts.

An attacker may poison this data by injecting malicious samples during operation that subsequently disrupt retraining.

In federated learning, for instance, edge devices collaborate with a central server, typically by sending gradients or model parameters.

[53] The current leading solutions to make (distributed) learning algorithms provably resilient to a minority of malicious (a.k.a.

For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware.

In this research area, some studies initially showed that reinforcement learning policies are susceptible to imperceptible adversarial manipulations.

[67] Adversarial attacks on speech recognition have been introduced for speech-to-text applications, in particular for Mozilla's implementation of DeepSpeech.

[76] A high level sample of these attack types include: An adversarial example refers to specially crafted input that is designed to look "normal" to humans but causes misclassification to a machine learning model.

In either case, the objective of these attacks is to create adversarial examples that are able to transfer to the black box model in question.

The result in theory is an adversarial example that is highly confident in the incorrect class but is also very similar to the original image.

The proposed attack is split into two different settings, targeted and untargeted, but both are built from the general idea of adding minimal perturbations that leads to a different model output.

[90] However, since HopSkipJump is a proposed black box attack and the iterative algorithm above requires the calculation of a gradient in the second iterative step (which black box attacks do not have access to), the authors propose a solution to gradient calculation that requires only the model's output predictions alone.

, an approximation of the gradient can be calculated using the average of these random vectors weighted by the sign of the boundary function on the image

The result of the equation above gives a close approximation of the gradient required in step 2 of the iterative algorithm, completing HopSkipJump as a black box attack.

[91][92][90] White box attacks assumes that the adversary has access to model parameters on top of being able to get labels for provided inputs.

[93] The attack was called fast gradient sign method (FGSM), and it consists of adding a linear amount of in-perceivable noise to the image and causing a model to incorrectly classify it.

This noise is calculated by multiplying the sign of the gradient with respect to the image we want to perturb by a small constant epsilon.

[95][94][93] FGSM has shown to be effective in adversarial attacks for image classification and skeletal action recognition.

[97] The attack proposed by Carlini and Wagner begins with trying to solve a difficult non-linear optimization equation:[63]