We have the answers to your questions! - Don't miss our next open house about the data universe!

Adversarial Attack: Definition and protection against this threat

Adversarial Attack: Definition and protection against this threat

An Adversarial Attack involves the manipulation or exploitation of a Machine Learning model using carefully crafted data. Explore the comprehensive insights into this concerning phenomenon that poses a significant challenge to the field of Artificial Intelligence.

What is the purpose of an Adversarial Attack?

An Adversarial Attack often aims to disrupt a Machine Learning model. By training a model on inaccurate or intentionally falsified data, it is possible to negatively impact its future performance.

Similarly, an already trained model can be corrupted by data. Even systems that have already been commercialized can be vulnerable to these attacks.

For example, simply placing a few stickers on the road can trick a self-driving car into taking the wrong lane and driving in the opposite direction. Similarly, subtle changes can deceive a medical analysis system into classifying a benign tumor as malignant.

A computer vision system can confuse a stop sign with a speed limit sign if a piece of tape is attached to it. Artificial Intelligence is, therefore, very easy to deceive at the moment.

The different types of Adversarial Attacks

There are three main categories of adversarial attacks.

The first type is an attack aimed at influencing a classifier by disrupting the model to alter its predictions.

The second type involves breaching the model’s security to inject malicious data that will be classified as legitimate. Finally, a targeted attack involves carrying out a specific intrusion or disruption, or creating general chaos.

These different categories can be further subdivided based on their operation into “black box” or “white box.” In the case of a white-box attack, the attacker has access to the model’s parameters. This is not the case in a black-box attack.

Evasion attacks are the most common. They involve modifying data to evade detection systems or to be classified as legitimate. These attacks do not involve influencing the data used to train the model.

For example, this could involve malware or spam hidden in an email attachment image to evade detection by email spam filters. Similarly, it is possible to deceive a biometric verification system.

Another type of attack is “Data Poisoning.” This method involves contaminating the data used to continue training a Machine Learning model. By injecting samples into the data, the process is disrupted, and the model is altered.

Model theft or model extraction involves reconstructing a model or extracting the data on which it was trained. The consequences can be severe if the training data or model are sensitive and confidential.

Some examples of adversarial attacks

Artificial Intelligence is a relatively new technology, but there are already numerous Adversarial Attacks to be concerned about. A hacker managed to 3D print a turtle figurine with a texture that tricks Google’s object detection AI into classifying it as a rifle.

Another example is an image of a dog that has been modified to look like a cat, both for humans and computers.

Concerning facial recognition systems, many individuals have created “adversarial patterns” for glasses and clothing capable of deceiving these AIs.

“Adversarial inputs” in audio can also disrupt intelligent assistants and prevent them from hearing voice commands.

In a study published in April 2021, researchers from Google and the University of California, Berkeley, demonstrated that even the most advanced forensic classifiers are vulnerable to adversarial attacks.

These AIs have been trained to distinguish between real and synthetic content, especially to combat Fake News or Deepfakes.

Unfortunately, Adversarial Attacks may hinder their ability to fulfill this role.


Another well-known case is that of the chatbot Tay, deployed on Twitter by Microsoft to learn how to hold conversations through interactions with other internet users. Unfortunately, trolls had fun feeding Tay with insults and offensive comments to make it uncontrollable. Sixteen hours after its launch, Microsoft was forced to deactivate its AI, which had become racist and homophobic.

How can you protect yourself against an Adversarial Attack?

Over the past few years, research on Adversarial Attacks has significantly expanded. In 2014, there were no studies on the subject on the Arxiv.org server. By 2020, there were more than 1100 studies on this platform.

However, according to the 2019 National Security Commission on Artificial Intelligence report, only a very small percentage of AI research is focused on defense against adversarial attacks.

Nevertheless, protective methods are developing, and this topic now holds a prominent place in prestigious conferences such as NeurIPS, ICLR, DEF CON, Black Hat, or Usenix.

Startups are emerging to combat this issue. For example, Resistant AI offers a product to strengthen AI algorithms against attacks.

A good defense measure is to test the resilience of models with a Trojan Horse. In the context of Machine Learning, this type of attack involves modifying the model to produce incorrect responses.

To simplify these tests and allow companies to conduct them on a large scale, researchers from John Hopkins University have developed the TrojAI framework. This set of tools generates datasets and models that have already been tampered with by trojans.

As a result, researchers can conduct experiments and try to understand the effects of various datasets on the models. It becomes easier to test detection methods to better reinforce AI.

On the other hand, Google researchers have published a study describing a framework capable of detecting attacks. Various companies offer tools to generate adversarial examples to deceive models in frameworks like MxNet, Keras, Facebook PyTorch, TensorFlow, or Caffe2.

These include Baidu Advbox, Microsoft Counterfit, IBM Adversarial Robustness Toolbox, or Salesforce Robustness Gym. MIT’s Artificial Intelligence Laboratory has also launched the TextFooler tool to enhance natural language processing (NLP) models.

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter


Get monthly insider insights from experts directly in your mailbox