We have the answers to your questions! - Don't miss our next open house about the data universe!

ChatGPT: How does this NLP algorithm work?

- Reading Time: 3 minutes

You've probably heard of ChatGPT, the tool that can answer all your questions in real time? Launched at the end of 2022, it looks set to revolutionise the field of artificial intelligence. We tell you all about it!

ChatGPT (Chat Generative Pretrained Transformer) was developed in November 2022 by the US company OpenAI. It is a language model that enables users to communicate in real time with a bot in online chat. The bot is thus capable of holding a conversation in several languages, answering questions, transmitting information on a wide range of subjects and sharing ideas.

In addition to these varied capabilities, ChatGPT has the ability to memorise conversations, enabling it to take previous responses into account and let the user make corrections. It’s an intelligent, innovative tool that facilitates communication and access to knowledge!

But how does ChatGPT work?

ChatGPT is an NLP (Natural Language Processing) algorithm that understands and generates natural language autonomously. To be more precise, it is a consumer version of GPT3, a text generation algorithm specialising in article writing and sentiment analysis. ChatGPT works like GPT3, using a model pre-trained on a huge corpus of 500 billion textual data. It uses two different types of learning: supervised learning and reinforcement learning.

During the supervised learning phase, it receives conversations in which both roles (bot and user) are played so that the data is labelled (questions and associated expected answers).

During the reinforcement learning phase, the previous interactions are used to rank the answers. This ranking is carried out by human trainers (Reinforcement Learning from Human Feedback) and enables a reward model to be created based on this ranking.

In addition to pre-training, the algorithm continues to train itself during its interactions with users. This is what enables it to memorise the context and remember the messages in a conversation.

Reinforcement Learning from Human Feedback in detail

As mentioned above, the reinforcement learning phase is more precisely Reinforcement Learning from Human Feedback (RLHF), which works with real human trainers. This phase is divided into two stages:

  1. After performing the supervised learning phase on labelled data and learning a supervised font, a Supervised Fine Tuning (SFT) model is generated.
  2. Human trainers then vote on the relevance of the model outputs, creating a comparison dataset on which an RM (Reward Model) is trained.


The RM reward model is optimised using the PPO reinforcement learning algorithm. The PPO algorithm is an on-policy algorithm that learns and updates a current policy based directly on the actions and rewards obtained. This generates a new model, called the “Policy Model”.

This policy model can be used to improve the initial SFT model and obtain a new comparison dataset. The two steps can then be repeated in a loop.

And what about developers?

ChatGPT also has capabilities normally only available to computer developers. It can generate code in several programming languages (Python, Java, C++, etc.) and develop an algorithm to solve a problem. To obtain such a result, all you have to do is clearly state what the code to be generated should return. It is also an established debugging tool, capable of identifying the source of a computer bug and correcting it, just like any other debugging software.

For Data Engineers, ChatGPT is also very useful as it can simulate a Virtual Machine (VM) with a Linux terminal.

Finally, ChatGPT can also detect vulnerabilities in a program.

ChatGPT therefore seems to be a functional NLP model from an editorial point of view or from an IT point of view, and in many fields!

So what are the limitations of this tool?

When we asked, ChatGPT replied: “I am a language processing model trained by OpenAI. My knowledge is limited to the cut-off date of my training data, which is 2021. I cannot surf the Internet to check information or access data that is not part of my memory.

I do my best to answer questions accurately and completely, but my answers may not always be correct or up to date”.

Indeed, since its launch, the main criticisms levelled at ChatGPT relate to its temporal limit, since its knowledge stops at events prior to the year 2021, and erroneous answers that can cause false information to be shared, even if the error rate remains minimal.

On the subject of code, ChatGPT also has its limitations, since the code generated can contain a lot of errors beyond a certain level of difficulty.

The tool is restricted to classic, repetitive programmes, but is not capable of performing computer analysis tasks, for example. Finally, its cybersecurity skills are too easily accessible, and many fear that they could be misused by hackers for malicious purposes.

From an ethical point of view, the tool faces other problems. Its use has been banned and access to it has been banned from the computers of certain American schools in the face of numerous cases of plagiarism.

Finally, like any statistical model, ChatGPT has emotional limits. Unlike human intelligence, it has no thoughts, intuition, morals or emotions, which can be a certain danger.

So, like any innovation, ChatGPT has its limits. Nevertheless, it remains a high-potential artificial intelligence tool whose performance continues to improve over time!

If you’d like to find out more about other NLP algorithms and other key areas where AI is coming increasingly to the fore, take a look at our blog.

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter


Get monthly insider insights from experts directly in your mailbox