We have the answers to your questions! - Don't miss our next open house about the data universe!

Linear regression with Python: How does it work?

- Reading Time: 3 minutes
Linear regression with Python: How does it work?

A key algorithm in Machine Learning, linear regression is used to establish relationships between one or more variables. To put this algorithm into practice with ease, data scientists can turn to programming languages, particularly Python. So how do you use linear regression with Python? DataScientest answers the question.

What is linear regression?

Before looking at the practical use of linear regression with Python, we need to go back to basics.

Linear regression - Definition

The linear regression model is a supervised learning algorithm used to predict a continuous target variable (dependent variable) from one or more explanatory variables (independent or predictive variables). In other words, it establishes relationships between 2 or more variables.

When there is only one explanatory variable, we speak of simple linear regression. On the other hand, if there are several, we speak of multiple linear regression.

Whether simple or multiple, linear regression can be used with Python.


💡Related articles:

PyCharm: all about the most popular Python IDE
Pandas : the Python library dedicated to Data Science
Seaborn : all about the Data Visualization tool in Python
PyWin32: Unveiling the Python Extension Exclusively for Windows Systems
SymPy: everything you need to know about the Python symbolic computation library



Linear regression - mathematical translations

The mathematical equation for linear regression is as follows:

Y = Θ0 + Θ1x1 + ... Θnxn

In this equation :

  • Y corresponds to the explanatory value ;
  • θ corresponds to the bias term or parameter vector;
  • x1, x2…, xn correspond to the entity values.

From a visual point of view, linear regression applies when the training data represents a point cloud. In this case, the aim is to identify the straight line that most closely approximates the set of points.

To ensure that this line is as accurate as possible, we measure the mean squared error.

Use cases for linear regression

If linear regression is the first algorithm used in machine learning, it’s because it can be applied in so many different ways.

For example:

  • Identify factors influencing the profitability of an investment;
  • Predict future sales by analyzing past sales;
  • Anticipate consumer behavior;
  • Predict the price of a house based on its characteristics;
  • etc.

And for every linear regression application, you can use Python.

How to use linear regression with Python?

To explain linear regression with Python, let’s take a concrete example. Here’s the starting hypothesis:

  • A restaurateur who already owns several restaurants in several cities wants to expand his business by setting up in different locations.
  • To analyze the next cities in which to set up, the restaurateur has two sets of data at his disposal: the profits made in the cities where he is already established, and the populations of the cities.

Since the aim is to make as much profit as possible in the town where he’s going to set up, he needs to predict the profit made in the town where he’s going to set up (dependent variable = Y) as a function of its population (independent variable = X).

So how do you evaluate the linear regression model with Python? Here’s how.

Formatting data

To model linear regression with Python, you need to prepare the training data in the right format. Ideally, you should prepare a CSV file with two columns: one for the population (independent variable) and another for the benefits (independent variable). Here’s what such a file might look like:

Population Profits
811 000 175 000 €
757 000 91 300 €
551 000 21 000 €
372 000 - 6 000 €

Load data

This training data must then be loaded into Python. Thanks to the Pandas library, you can easily read CSV files. Here’s how to do it.

import pandas as pd 

The read_csv() function returns a two-dimensional array containing the dependent and independent variables. But to use linear regression with Python, you need to separate the two columns into two Python variables.

For the first column corresponding to population size :

X = df.iloc[0:len(df),0]

For the second column corresponding to profits :

Y = df.iloc[0:len(df),1]

This gives you a simple table containing the entire training data set.

Visualize data

To better understand linear regression with Python, it can be useful to visualize it. This will enable you to identify points and better understand dispersions.

To obtain a scatter plot, you can use Matplotlib, a Python library. Here’s how to get it:

import matplotlib.pyplot as plt
axes = plt.axes()

Apply the algorithm

From there, the aim is to find a predictive function F(X) with population size as input and expected profits as output.

To model linear regression with Python, the easiest way is to use the Scikit Learn library by typing this query:

from sklearn.linear_model import LinearRegression.

From there, you can build your template. Here’s the code to write:

reg = LinearRegression(normalize=True)

And to find the line f(x)=ax+b with minimum squared error, type :

a = reg.coef_
b = reg.intercept.

Make predictions

To plot the linear regression curve with Python, simply type the code below:

ordonne = np.linspace

Master linear regression with Python

Linear regression is undoubtedly the algorithm to master in data science. And if using it via Python still seems complex, that’s only temporary.

With the right training, you’ll be able to evaluate any machine learning algorithm across different programming languages.

But which course should you choose? DataScientest, of course. Discover our program.

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter


Get monthly insider insights from experts directly in your mailbox