Sentiment analysis is a technique that has developed strongly in tandem with social networking, where users can express themselves on a massive scale and constantly share their feelings. Sentiment analysis aims to determine the emotional tone of a speech by classifying it in different categories, such as positive, negative or neutral.
It is popular with a wide range of players, from politicians in the middle of an election campaign to companies ready to launch a new product, to name but a few.
The politician will want to test his popularity rating with the electorate, while the company will want to assess how well its product is received by the public.
But what does Sentiment Analysis actually involve and how do Data Scientists use Machine Learning techniques to decipher the emotional tone of a speech?
Sentiment analysis is used when communicating, whether in writing or verbally. Data Scientists can use audio or text data. It is the format of the data that determines the Machine Learning technique to be used.
How do you analyse a spoken sentence?
In this case, the data to be analysed is an electrical signal generated by the brain called an electroencephalogram (or EEG). Overall, it looks like this:
To collect this data, which is then analysed, electrodes are placed on the skull. If we carry out the experiment on you, you will look something like this:
Once the signals have been collected, the features representing the information contained in the signal need to be extracted. These features are a more readable format for the Machine Learning algorithm that will classify the signals. Features are extracted by applying various transformations, such as filters, to the electrical signal.
Once the features have been extracted, we give them as input to our algorithm, such as a Neural Network, so that it can classify the signals into different categories: positive/negative/neutral.
In reality, this technique of recovering a cerebral signal and then analysing it to deduce a polarity (positive/negative/neutral) is rarely used in everyday life and is mainly exploited in the field of research, particularly by researchers interested in issues combining Artificial Intelligence and neuroscience.
How is a comment written on Facebook classified?
To solve this problem, Data Scientists use classic Natural Language Processing methods (for more details, please refer to the Introduction to NLP – Natural Language Processing article on the site).
These methods analyse words directly and must take into account the contextual and linguistic aspects of the data.
In short, the sentence to be analysed will be treated as a sequence that defines a context and whose words are dependent on each other, i.e. they will be analysed in relation to the words that precede them in the sentence.
To process these sentences, Data Scientists will use Recurrent Neural Networks (RNN), which are neural networks specialised in sequence processing.
A (highly simplified) RNN architecture for sentiment analysis would therefore look something like this pipeline:
If we take the sentence “You’re polite”, we can see that the word “polite”, once analysed by the algorithm, will return one of the two classes (in this case, positive) and that this word “polite” was analysed after the word “es” (in other words, in the context of the word “es”) which was itself analysed after the word “tu” (in other words, in the context of the word “tu”).
This recurrence makes it possible to define a context, which is essential for analysing sentiment.
Of course, there are sentences that are subtle and complex for machines to analyse. “This perfume smells extremely good, it’s addictive”. The word “extremely” can have either positive or negative connotations, while “addictive” is generally associated with a negative feeling.
Although recurrent neural network (RNN) technology has been around for a number of years, it is only recently that scientists have been able to obtain some very promising results, thanks in particular to constantly improving computing capacities. As a result, this technology is being used more and more regularly by companies wishing to obtain feedback from their users on a product or any other person with access to a large quantity of messages in order to derive a general feeling.
Did you like this article?
Wondering how to retrieve website data to analyse user sentiment?