We have the answers to your questions! - Don't miss our next open house about the data universe!

SQL joins: everything you need to know about table associations

- Reading Time: 4 minutes
jointureSQL

If you're a regular reader of our blog, you've already heard of SQL or "Structured Query Language": a programming language for managing databases, and you know all about the importance of SQL joins.

What is SQL Joins ?

If you missed the article on the basics of SQL language. We’ll give it to you again here.

At first glance, SQL joins may seem complicated or difficult to apply. DataScientest has put together a quick summary of the main SQL joins, with a simple example to make your life easier! But first of all, what is a join?  

Joins are used to retrieve data from a database where tables have relationships with each other. They enable you to exploit the tables in a database, link them and obtain efficient results.

There are several types, and we’re going to look at the main ones. 

 Throughout this article, we’ll be relying on two tables: 

  • the city table shows the cities where the customer corresponding to the customer_id lives, 
  • the email table gives customer email addresses.
 
City
Data1

Email

Data2

1. INNER JOIN

The internal join or INNER JOIN returns data when the condition is true in both tables.  

As shown in the diagram, this type of join will concatenate the tuples of 2 tables two by two if a condition is met. This condition can be of any type, as long as it returns a Boolean. Typically, this condition will be the equality of a common attribute. It is necessary to specify the attributes to be used to perform the join. The ON keyword is used, followed by the desired equality after the tables have been specified in the INNER JOIN. 

Inner join

This is the query that wastes the most information since it only selects rows in both tables where there is information in both tables.

Inner join1

Here, by performing an INNER JOIN between the two tables, we obtain only the customer IDs present in the email table and in the city table. Customer IDs 4 and 5, which are present in the email table but not in the city table, are not taken into account. The same applies to customer IDs 7 and 8, which are in the city table but not in the email table.

2. LEFT JOIN

The LEFT JOIN is a join between 2 tables that returns all records from the left-hand table, even if there is no match with the right-hand table. If there is no match, missing values are set to NULL. As with the INNER JOIN, you need to specify the attributes to be used to perform the join: the ON keyword is used, followed by the desired equality after the tables in the LEFT JOIN have been filled in.

left join sql

Here we’re doing a LEFT JOIN query against the email table, so we’re displaying all rows when the customer_id is present in the left-hand table, i.e. in the email table.  

 Thus the lines corresponding to customer_id 7 and 8 are not present in the query output.

left join2

3. RIGHT JOIN

The RIGHT JOIN is a join between 2 tables that returns all records from the right-hand table, even if there is no match with the left-hand table. If there is no match, missing values are set to NULL. 

As with INNER JOIN and LEFT JOIN, you need to specify the attributes to be used to perform the join: use the ON keyword followed by the desired equality after entering the tables in the RIGHT JOIN.

right join

Here we’re doing a LEFT JOIN query against the email table, so we’re displaying all rows when the customer_id is present in the left-hand table, i.e. in the email table.  

 Thus the lines corresponding to customer_id 7 and 8 are not present in the query output.

right join1

4. FULL JOIN

The FULL JOIN is a join between 2 tables that returns all records from both tables, even if there is no match with the other. If there is no match, missing values are set to NULL. This join also requires the ON attribute. 

Full join

Here, it doesn’t matter if there are no corresponding rows in one of the two tables, all the information from both tables is present. This is the join that keeps as much information as possible.

FUll join1

5. NATURAL JOIN

The NATURAL JOIN is a join between 2 tables that returns records from both tables in a “natural” way.  There must be at least one column with the same name in both tables. It performs a natural join by returning rows with pairs that match both tables.  

With NATURAL JOIN, the ON is not specified.

Natural join

The output shows that a join has been made between the two tables with “id_client”, as this is the only variable present in both tables. 

But what’s the difference between NATURAL JOIN and INNER JOIN?  

The INNER JOIN avoids the repetition of equal rows, which is not possible with the NATURAL JOIN. The INNER JOIN returns a table based on the data specified in the ON, whereas the NATURAL JOIN returns a table based on a column with the same name and type in both tables.  

 The four types of SQL joins to remember are : 

  • INNER JOIN to retrieve information between two tables when the ON condition is met on both tables, 
  • LEFT / RIGHT JOIN to retrieve information between two tables, keeping all information from one of the two tables, 
  • FULL JOIN retrieves all information from both tables without loss of information (but may display many NULLs), 
  • NATURAL JOIN performs a natural join between the two tables, so there’s no need to specify the ON.  

Joins are an efficient way of associating several tables. There are several types, and we’ve already seen the main ones. 

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter

DataNews

Get monthly insider insights from experts directly in your mailbox