While the SELECT query is one of the most commonly used to read data from a table, it has some limitations if used on its own. That's why other commands are associated with it. These combinations allow you to refine the results. One of these commands is SQL DISTINCT, which removes duplicates within a MySQL database.
What is SQL DISTINCT?
The SQL DISTINCT query is used to eliminate duplicates. When you use the SELECT SQL query, you can choose one or more columns from a table. However, within these columns, there may be duplicate rows. To avoid this, you can add DISTINCT to the SELECT query. By doing so, all redundancies are removed.
In practice, this command looks like this:
SELECT DISTINCT column_name
FROM table_name
You can use this command on a single column or on a set of related columns.
💡Good to know: If you are using an Oracle database management system, you should not use the SQL DISTINCT command but rather the UNIQUE command.
3 examples of using SQL DISTINCT
To help you better understand how to use the SQL DISTINCT query, here are 3 concrete application examples.
The SELECT DISTINCT query for a single column
This is the basic use of the SQL DISTINCT query.
In our example, you have a table with all your customers, some of whom belong to the same family (same last name).
Here is your table:
ID | First Name | Last Name |
---|---|---|
1 | Pierre | Dupont |
2 | Jeanne | Martin |
3 | Henri | Durand |
4 | Bertrand | Petit |
5 | Martine | Dupont |
If you want to display only unique last names (families), here is the query to use: SELECT DISTINCT last_name FROM client
.
And here is the result:
Name |
---|
Dupont |
Martin |
Durand |
Petit |
Here, the name Smith is displayed only once (as opposed to twice in the first table).
The SELECT DISTINCT query for multiple columns
As mentioned above, the DISTINCT command can also be used for multiple columns.
Let’s take an example of a similar Customer table:
ID | First Name | Last Name |
---|---|---|
1 | Martine | Dupont |
2 | Jeanne | Martin |
3 | Henry | Durand |
4 | Bertrand | Petit |
5 | Martine | Dupont |
6 | Antoine | Martin |
7 | Fabienne | Bernard |
8 | Christelle | Robert |
9 | Henri | Durand |
If you want to display only unique families, here’s the query to use: SELECT DISTINCT first name last name FROM customer.
And here’s the result:
Prénom | Nom |
---|---|
Martine | Dupon |
Jeanne | Martin |
Henri | Durand |
Bertrand | Petit |
Antoine | Martin |
Fabienne | Bernard |
Christelle | Robert |
In this case, the Martine Dupont and Henry Durand lines have been eliminated.
The number of unique values
If you want to know how many unique values your table contains, simply perform the following operation:
SELECT COUNT(DISTINCT name) AS Number FROM Customer
Most often, the SQL query DISTINCT is used with the COUNT command to count values. But you can add other aggregations to suit your needs. This will enable you to perform various mathematical or statistical operations, without taking duplicates into account.
Alternatives to SQL DISTINCT
Other commands can be used to eliminate duplicates. In particular, GROUP BY / ORDER BY.
In this context, the SQL query is as follows:
SELECT lastname, firstname FROM `Customer` GROUP BY lastname, firstname ORDER BY lastname, firstname
The result is exactly the same as with the query: SELECT DISTINCT firstname lastname FROM customer.
Master SQL with DataScientest
The SQL programming language is packed with essential queries for manipulating data. If you’d like to deepen your knowledge in this area, join our training courses at Datascientest. These business-oriented courses will have you 100% operational by the end of the course.