Linear regression use cases

If linear regression is the first algorithm used in machine learning, it's because it can be applied in so many different ways. For example: Identify the factors influencing the profitability of an investment; Predict future sales by analyzing past sales; Anticipate consumer behavior; Predict the price of a house based on its characteristics; etc.

Back to articles

Linear regression with Python: How does it work?

13 Mar 2024

m de lecture

Data Science

Melanie

A key algorithm in Machine Learning, linear regression is used to establish relationships between one or more variables. To put this algorithm into practice with ease, data scientists can turn to programming languages, particularly Python. So how do you use linear regression with Python? DataScientest answers the question.

What is linear regression?

Before looking at the practical use of linear regression with Python, we need to go back to basics.

Linear regression - Definition

The linear regression model is a supervised learning algorithm used to predict a continuous target variable (dependent variable) from one or more explanatory variables (independent or predictive variables). In other words, it establishes relationships between 2 or more variables.

When there is only one explanatory variable, we speak of simple linear regression. On the other hand, if there are several, we speak of multiple linear regression.

Whether simple or multiple, linear regression can be used with Python.

💡Related articles:

PyCharm: all about the most popular Python IDE

Pandas : the Python library dedicated to Data Science

Seaborn : all about the Data Visualization tool in Python

PyWin32: Unveiling the Python Extension Exclusively for Windows Systems

SymPy: everything you need to know about the Python symbolic computation library

Linear regression - mathematical translations

The mathematical equation for linear regression is as follows:

Y = Θ₀ + Θ₁x₁ + ... Θ_nx_n

In this equation :

Y corresponds to the explanatory value ;
θ corresponds to the bias term or parameter vector;
x1, x2…, xn correspond to the entity values.

From a visual point of view, linear regression applies when the training data represents a point cloud. In this case, the aim is to identify the straight line that most closely approximates the set of points.

To ensure that this line is as accurate as possible, we measure the mean squared error.

Use cases for linear regression

If linear regression is the first algorithm used in machine learning, it’s because it can be applied in so many different ways.

For example:

Identify factors influencing the profitability of an investment;
Predict future sales by analyzing past sales;
Anticipate consumer behavior;
Predict the price of a house based on its characteristics;
etc.

And for every linear regression application, you can use Python.

How to use linear regression with Python?

To explain linear regression with Python, let’s take a concrete example. Here’s the starting hypothesis:

A restaurateur who already owns several restaurants in several cities wants to expand his business by setting up in different locations.
To analyze the next cities in which to set up, the restaurateur has two sets of data at his disposal: the profits made in the cities where he is already established, and the populations of the cities.

Since the aim is to make as much profit as possible in the town where he’s going to set up, he needs to predict the profit made in the town where he’s going to set up (dependent variable = Y) as a function of its population (independent variable = X).

So how do you evaluate the linear regression model with Python? Here’s how.

Formatting data

To model linear regression with Python, you need to prepare the training data in the right format. Ideally, you should prepare a CSV file with two columns: one for the population (independent variable) and another for the benefits (independent variable). Here’s what such a file might look like:

Population	Profits
811 000	175 000 €
757 000	91 300 €
551 000	21 000 €
372 000	- 6 000 €
…	…

Load data

This training data must then be loaded into Python. Thanks to the Pandas library, you can easily read CSV files. Here’s how to do it.

import pandas as pd 
df=pd.read_csv("D:\DEV\PYTHON_PROGRAMMING\donnees-d-entrainement-regression-lineaire.csv")

The read_csv() function returns a two-dimensional array containing the dependent and independent variables. But to use linear regression with Python, you need to separate the two columns into two Python variables.

For the first column corresponding to population size :

X = df.iloc[0:len(df),0]

For the second column corresponding to profits :

Y = df.iloc[0:len(df),1]

This gives you a simple table containing the entire training data set.

Visualize data

To better understand linear regression with Python, it can be useful to visualize it. This will enable you to identify points and better understand dispersions.

To obtain a scatter plot, you can use Matplotlib, a Python library. Here’s how to get it:

import matplotlib.pyplot as plt
axes = plt.axes()
axes.grid()
plt.scatter(X,Y) 
plt.show()

Apply the algorithm

From there, the aim is to find a predictive function F(X) with population size as input and expected profits as output.

To model linear regression with Python, the easiest way is to use the Scikit Learn library by typing this query:

from sklearn.linear_model import LinearRegression.

From there, you can build your template. Here’s the code to write:

reg = LinearRegression(normalize=True)
reg.fit(x,y)

And to find the line f(x)=ax+b with minimum squared error, type :

a = reg.coef_
b = reg.intercept.

Make predictions

To plot the linear regression curve with Python, simply type the code below:

ordonne = np.linspace
plt.scatter(x,y)
plt.plot(ordonne,a*ordonne+b,color='r')

Master linear regression with Python

Linear regression is undoubtedly the algorithm to master in data science. And if using it via Python still seems complex, that’s only temporary.

With the right training, you’ll be able to evaluate any machine learning algorithm across different programming languages.

But which course should you choose? DataScientest, of course. Discover our program.

DataScientest News

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

Data Marketing & AI

MLOps

ETL Developer

Data Ops Engineer

Amazon Web Services (AWS)

Microsoft Power BI

Linear regression with Python: How does it work?

What is linear regression?

Linear regression - Definition

Linear regression - mathematical translations

Use cases for linear regression

How to use linear regression with Python?

Formatting data

Load data

Visualize data

Apply the algorithm

Make predictions

Master linear regression with Python

You are not available?

Related articles

What are the 4 types of artificial intelligence?

API Management: What is it? How to leverage it effectively?

Changing your life at 60 without money: How to do it?

SAP Business One: Discover this flexible and powerful ERP for SMEs

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

Data Marketing & AI

MLOps

ETL Developer

Data Ops Engineer

Amazon Web Services (AWS)

Microsoft Power BI

Linear regression with Python: How does it work?

What is linear regression?

Linear regression - Definition

Linear regression - mathematical translations

Use cases for linear regression

How to use linear regression with Python?

Formatting data

Load data

Visualize data

Apply the algorithm

Make predictions

Master linear regression with Python

You are not available?

Related articles

What are the 4 types of artificial intelligence?

API Management: What is it? How to leverage it effectively?

Changing your life at 60 without money: How to do it?

SAP Business One: Discover this flexible and powerful ERP for SMEs

DataNews