SQL DISTINCT to eliminate duplicates

The SQL DISTINCT query can be used to limit duplicates. When you use the SELECT statement command, you can select one or more table columns. But within these columns, there may be duplicate rows. To avoid this, you need to add DISTINCT to the SELECT query. By doing so, all redundancies are eliminated.

🚀 Think you’ve got what it takes for a career in Data? Find out in just one minute!

Back to articles

SQL DISTINCT: How do I use this query?

27 Nov 2023

m de lecture

Data Science

Melanie

While the SELECT query is one of the most commonly used to read data from a table, it has some limitations if used on its own. That's why other commands are associated with it. These combinations allow you to refine the results. One of these commands is SQL DISTINCT, which removes duplicates within a MySQL database.

What is SQL DISTINCT?

The SQL DISTINCT query is used to eliminate duplicates. When you use the SELECT SQL query, you can choose one or more columns from a table. However, within these columns, there may be duplicate rows. To avoid this, you can add DISTINCT to the SELECT query. By doing so, all redundancies are removed.

In practice, this command looks like this:

SELECT DISTINCT column_name

FROM table_name

You can use this command on a single column or on a set of related columns.

💡Good to know: If you are using an Oracle database management system, you should not use the SQL DISTINCT command but rather the UNIQUE command.

3 examples of using SQL DISTINCT

To help you better understand how to use the SQL DISTINCT query, here are 3 concrete application examples.

The SELECT DISTINCT query for a single column

This is the basic use of the SQL DISTINCT query.

In our example, you have a table with all your customers, some of whom belong to the same family (same last name).

Here is your table:

ID	First Name	Last Name
1	Pierre	Dupont
2	Jeanne	Martin
3	Henri	Durand
4	Bertrand	Petit
5	Martine	Dupont

If you want to display only unique last names (families), here is the query to use: SELECT DISTINCT last_name FROM client.

And here is the result:

Name
Dupont
Martin
Durand
Petit

Here, the name Smith is displayed only once (as opposed to twice in the first table).

The SELECT DISTINCT query for multiple columns

As mentioned above, the DISTINCT command can also be used for multiple columns.

Let’s take an example of a similar Customer table:

ID	First Name	Last Name
1	Martine	Dupont
2	Jeanne	Martin
3	Henry	Durand
4	Bertrand	Petit
5	Martine	Dupont
6	Antoine	Martin
7	Fabienne	Bernard
8	Christelle	Robert
9	Henri	Durand

If you want to display only unique families, here’s the query to use: SELECT DISTINCT first name last name FROM customer.

And here’s the result:

Prénom	Nom
Martine	Dupon
Jeanne	Martin
Henri	Durand
Bertrand	Petit
Antoine	Martin
Fabienne	Bernard
Christelle	Robert

In this case, the Martine Dupont and Henry Durand lines have been eliminated.

The number of unique values

If you want to know how many unique values your table contains, simply perform the following operation:

SELECT COUNT(DISTINCT name) AS Number FROM Customer

Most often, the SQL query DISTINCT is used with the COUNT command to count values. But you can add other aggregations to suit your needs. This will enable you to perform various mathematical or statistical operations, without taking duplicates into account.

Alternatives to SQL DISTINCT

Other commands can be used to eliminate duplicates. In particular, GROUP BY / ORDER BY.

In this context, the SQL query is as follows:

SELECT lastname, firstname FROM `Customer` GROUP BY lastname, firstname ORDER BY lastname, firstname

The result is exactly the same as with the query: SELECT DISTINCT firstname lastname FROM customer.

Master SQL with DataScientest

The SQL programming language is packed with essential queries for manipulating data. If you’d like to deepen your knowledge in this area, join our training courses at Datascientest. These business-oriented courses will have you 100% operational by the end of the course.

DataScientest News

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

MLOps

ETL Developer

Data Ops Engineer

Student Course

Amazon Web Services (AWS)

Microsoft Power BI

SQL DISTINCT: How do I use this query?

What is SQL DISTINCT?

3 examples of using SQL DISTINCT

The SELECT DISTINCT query for a single column

The SELECT DISTINCT query for multiple columns

The number of unique values

Alternatives to SQL DISTINCT

Master SQL with DataScientest

You are not available?

Related articles

HRIS Consultant: Skills, Responsibilities, Training

Fetch.ai: Operation, use cases, and future of the project

Self-Organizing Maps (SOM): What are they and how to use them?

Digital Twin: Definition, Examples, and Use Cases

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

MLOps

ETL Developer

Data Ops Engineer

Student Course

Amazon Web Services (AWS)

Microsoft Power BI

SQL DISTINCT: How do I use this query?

What is SQL DISTINCT?

3 examples of using SQL DISTINCT

The SELECT DISTINCT query for a single column

The SELECT DISTINCT query for multiple columns

The number of unique values

Alternatives to SQL DISTINCT

Master SQL with DataScientest

You are not available?

Related articles

HRIS Consultant: Skills, Responsibilities, Training

Fetch.ai: Operation, use cases, and future of the project

Self-Organizing Maps (SOM): What are they and how to use them?

Digital Twin: Definition, Examples, and Use Cases

DataNews