One of the most useful SQL commands for data analysts is GROUP BY. It allows you to easily generate advanced statistics from categories. Suffice to say, it's one of the 10 commands you'll use the most, so you might as well know what it's all about.
SQL’s strengths are well established. This language offers an unrivalled level of abstraction for analyzing vast volumes of information. Whatever the database software used (MySQL, Ingres, Oracle Database, Microsoft SQL Server, PostgreSQL, SQLite, Firebird…), the data analyst knows he’ll get the desired result without having to program the precise logic.
💡Related articles:
How do you group information in a database?
If there’s one particularly important command you’ll appreciate, it’s SQL GROUP BY. It groups information from a database according to a particular column, from which it is possible to obtain statistical information: sum, maximum, minimum, average, etc.
To better understand this concept, it’s best to start with an example. Here’s an extract from a table called Staff
Name | Department | Salary |
---|---|---|
Paul | Finance | $3,000 |
Ray | Sales | $2,500 |
Julia | Finance | $3,200 |
Dan | Marketing | $4,300 |
Josie | Sales | $2,200 |
Donna | Finance | $2,700 |
If we wanted to obtain the sum of salaries by department, we’d simply use a sequence like the following:
SELECT Department, SUM(Salary)
FROM Staff
GROUP BY Service
ORDER BY Service;
In this example, we group the table according to the Service column and ask for the sum of salaries for each of them.
The answer would be – based on the sample shown above :
Finances | 8 900 |
---|---|
Marketing | 4 300 |
Sales | 4 700 |
Finances | 2 967 |
---|---|
Marketing | 4 300 |
And that’s it! It’s that simple. The ease with which such operations can be implemented is a testament to the qualities of the SQL language.
DataScientest News
Sign up for our Newsletter to receive our guides, tutorials, events, and the latest news directly in your inbox.