We have the answers to your questions! - Don't miss our next open house about the data universe!

Transformer Neural Network: What is it? How does it work?

- Reading Time: 3 minutes
Transformer Neural Network: What is it? How does it work?

Artificial intelligence is advancing so rapidly thanks to the development of neural networks. Since 2017, a new architecture model has emerged: Transform Neural Networks (TNNs). Rather than relying on rigid sequential structures, these models favour innovative mechanisms that are revolutionising natural language processing. Find out more about transform neural networks.

What is the transform neural network?

TNN, a revolution in NLP

The transform neural network (or) is a neural network architecture introduced by the article “Attention is All You Need” in 2017. The TNN uses attention mechanisms to solve sequence-to-sequence tasks while easily managing long-term dependencies. In other words, this architecture model does not follow a sequential structure. This facilitates parallel data processing and speeds up model training. This is particularly interesting for long sequences in deep learning.

As a reminder, a neural network understands the context and meaning of a text by following the relationships in the sequential data. But these sequences are often rigid. This can slow down data processing. Transformative models, on the other hand, are more flexible when it comes to processing this data. They are able to detect how data elements, even if they are far apart, can influence each other.

In this respect, TNN has revolutionised the field of natural language processing (NLP) and is at the root of a wave of progress in the field of machine learning. So much so that some experts refer to them as “transformative AI” or “basic models”.

In fact, they represent an impressive evolution compared with more traditional models such as recurrent neural networks (RNN) or convolutional neural networks (CNN).


Before neural network transformers, AI experts used other architecture models. More specifically :

  • RNNs: recurrent neural networks process data sequentially. RNNs therefore analyse information more slowly and can run into difficulties with long-term dependencies.
  • CNNs: convolutional neural networks are designed to process images and structured data. To do this, they use convolution layers (to extract local features from the input data) and pooling layers (to reduce the dimensionality of the extracted features).

These two models are generally designed to perform specific tasks. The transform neural network, on the other hand, was designed for natural language processing.

But unlike the other two, it is also very useful for carrying out a multitude of tasks, such as image generation, machine translation, sequential processing or data representation.

This adaptation and flexibility has been made possible by the operating principles of the TNN.

How does the transforming neural network work?

To be so effective, the transform neural network is based on 4 fundamental principles:

  • Attention mechanism (or auto attention): thanks to this mechanism, the TNN is able to concentrate on different parts of the input sequence. It assigns a weighting to each part according to its importance for the specific task in hand.
  • Encoders and decoders: encoders process part of the input sequence, while decoders generate part of the output sequence. These technologies allow the neural network transformer to be particularly flexible, and to adapt to different tasks.
  • Positional encodings: these give the model information about the relative position of elements in the sequence. This is particularly important in the absence of recurrent structure.
  • Multi-head attention: this is what enables the TNN to process different representations in parallel. In doing so, the model is able to capture complex relationships.

What is the purpose of transforming models?

As we have seen, neural network transformers demonstrate unparalleled flexibility and adaptation. But what do they actually do?

Here are a few examples of applications for Transformer models:

  • Text translation: for hearing-impaired people, they are able to transcribe oral speech into written text, almost in real time.
  • Sequence understanding: a great help to scientists, the basic models can understand the gene chains in DNA, or the amino acids in proteins. This speeds up the design of medicines.
  • Anomaly detection: in banks or industrial companies, TNNs can detect fraud or anomalies on production lines. This enables the companies concerned to react quickly.

These architectures are so efficient that they are used by the most popular and efficient AI models, such as GPT and BERT.

Master TNN with DataScientest

Because of their flexibility and efficiency, neural network transformers are a must for all (future) data scientists. But this requires technical skills that cannot be acquired overnight. You need to be trained to use transform models effectively.

DataScientest makes it possible. Find out more about our programme.

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter


Get monthly insider insights from experts directly in your mailbox