The word cloud is a visualization tool that allows you to quickly see which words are the most frequent in a text or text corpus. In this article, we’ll take a look at how it works.
Why create a wordcloud?
The principle of the word cloud is as follows: Within a text, we calculate the frequency of the words that make it up.
The more often a word appears, the more it will be highlighted!
There are three main ways to customize the word cloud: size, color and shape.
The main advantage of the word cloud lies in its intuitive and aesthetic aspect: in no time at all, you understand which words you need to focus on.
The colors chosen can also make the visualization all the more striking, with the use of color gradients for example.
Their shape can be customized with images, known as “masks”.
For example, if we analyze a series of tweets about President Donald Trump, we can shape our word cloud to resemble his face, which is also very informative.
In short, the word cloud enables us to condense a large amount of information into a minimum number of visualizations, which is often what we’re looking for when carrying out textual analyses.
What are its drawbacks?
Although very practical, the word cloud is not always the most relevant, and therefore the most effective, tool for text analysis.
It is less precise than a bar chart, which gives more specific indications of word frequency and enables effective comparison of the frequency of occurrence of words in the text. It is therefore less precise in the information it transmits.
Nor will it be easy to translate the context in which these words appear. For example, it will be difficult to interpret sentences containing a negation. The locution “Not satisfied” will not necessarily be transcribed like this, since it will analyze the frequency of individual words.
Wordcloud & sentiment analysis
Nevertheless, its use is often relevant to sentiment analysis.Let’s take an example from a marketing case study.
It could be interesting for a customer service team to visualize which words are most frequently used in positive and negative comments on marketed products.
This enables them to communicate more effectively and respond more optimally to different queries.
After having carried out a sentiment analysis on the comments available to them, the word cloud will support this initial analysis by enabling teams to see which words are the most frequent and have led to their classification.
For example, if customer teams notice that the words most present in negative comments have a strong link with the delivery service: “delays”, “costs”, “postage”. This is a very simple way of orienting downstream work and guiding decision-making, such as deciding to reduce shipping costs or change carrier. With the help of word clouds, these teams will be able to easily justify their choices to other teams without any additional work.
In a future article, we’ll show you how to create wordclouds on python, so please be patient 😉
In the meantime, you can read our previous article on the subject.