What is the job of a data analyst? There are many ways to answer this question:
A data analyst is part storytelling scientist, part coder, and part business consultant.
He uses different computer tools and codes to extract data from various sources, analyzes this data by making sense of it, and presents these results often in a visual way. In this article, we will talk about several tools that a data analyst may use on a daily basis.
Data sources and queries
To access the data, the data analyst can use different tools. BigQuery allows for massive interactive analysis of large data sets in collaboration with the Google data warehouse.
Amazon Redshift is a data warehouse product that is part of the Amazon Web Services cloud computing platform. And the classic MySQL which is a relational database management system.
Most of the time, to have access to data, we write queries in SQL language (Structured Language Query). Let’s say the company you work for has a large database with different information about customers, employees, partners, etc., and your manager wants to answer the following question: “What were the sales last quarter?”
To answer this question I may need to use SQL to drill down into the data and do further research.
The SQL language is used in DBMSs, which are DataBase Management Systems. Among the data management systems are MySQL, PostgreSQL, ORACLE, SQL Server, etc. These systems allow you to use the SQL language to manage databases. Thanks to the SQL language you can: store data, and manipulate data (perform queries and procedures).
To simplify, SQL is a language that allows you to link your computer and a DBMS. A data analyst must be able to use SQL to access and analyze data.
Processing data and displaying results
Now that we have access to the data, we will go through a data processing and results display step. This time we would like to answer the following question: “Why was there such a big difference between the sales of the last two quarters?
We can try to answer this question using Excel, but if the database is much too large or we need to make more complex modifications, the Python or R languages (not only) allow us to go further and study the data more flexibly.
Python is a versatile language that is quite easy to use and learn. It has many libraries for scientific computing and if you already work for a company, it is likely that the company already uses it for other tasks.
Anaconda is a distribution of the Python and R programming languages for scientific computing, which aims to simplify package management and deployment. The distribution includes packages for Windows, Linux, and macOS.
Pandas is an open-source Python library that is most widely used for data science, data analysis, and machine learning tasks. The library is built on top of another library called Numpy, which provides support for multidimensional arrays. Pandas is one of the most popular data processing packages and works well with many other data science modules in the Python ecosystem.
The stats models package has functions that allow you to do more advanced statistics on data. If we are looking for data that is accessible on websites we can do web scraping with the Selenium library.
To sum up, Python allows you to do a lot of different things and that’s why it’s one of the most popular languages in the data world today.
Low-code and No-code tools
Low/no-code platforms provide a development environment used to create software applications through a graphical user interface.
Among these tools, Bubble makes it possible to program complex applications and websites without coding, while the Microsoft Power Platform range of software applications enables application development and application connectivity.
One of the main advantages of low/no code tools is that you save development time to get a working application that meets a need.
After several steps, we would like to create a dashboard for the whole team so that everyone can follow the analysis.
With the recovered data we can display graphs that allow us to transmit relevant information to people who are not necessarily comfortable with code.
Power BI is an interactive reporting platform. It can easily handle a wide variety of data types and massive amounts of data.
One can use PowerBI either to create reports, work with data, or simply visualize reports to make decisions.
Kibana is a data visualization extension for Elasticsearch. It allows you to search and visualize data indexed in Elasticsearch.
One of the hottest solutions right now is Amazon QuickSight, which allows users to query data in natural language to generate visualizations in just seconds.
So now you know more about the tools that data analysts use daily.
And if you want to learn more about the Data Analyst job, check out our training page!