Time series plot Python: The Tutorial

20 Jan 2024

m de lecture

Data Science

Melanie

A time series is an array of data showing the evolution of a variable over time. In Python, this is often processed in the form of a Series Pandas indexed by a DateTime. This format makes for easy processing and visualization.

Time series are used in many fields, such as astronomy and meteorology, but are probably most widely used in economics. Think, for example, of company share prices or temperature trends over time.

The ARMA process

AR

A first way of modeling time series is to use the AR or auto-regressive model. This model aims to predict the value of our time series at an instant t using a sum over the previous p instants.

where we have :

Xt the value at time t
epsilon t the error at time t
alpha i the coefficient associated with Xt-i

MA

The MA or moving average process aims to predict the value at time t from the errors of the last q instants.

where we have :

Xt the value at time t
epsilon t the error at time t
beta i the coefficient associated with epsilon t-i

ARMA

The ARMA process combines an AR process and an MA process. It is called ARMA(p, q). The exact mathematical formula is as follows:

With the same notation as above.

Although this formula can be frightening, it’s actually very simple to understand.

In concrete terms, an ARMA(1, 1) process corresponds to

Limitations of the model

Although this model is very simple and gives good results, it has a few limitations. Firstly, it only gives good results on so-called stationary time series, i.e. with constant mean and variance.

What’s more, it’s difficult to predict the next values at more than t + 1, since we then no longer have any feedback on the error of our model for the MA part.

Python-based study

First analysis

To begin our analysis of the time series, we import the Pandas library and Matplotlib, which will be used for visualization, and read the csv file in which the time series is stored.

In this command line, in addition to the name of the file to be read, the function arguments are :

parse_dates: this argument tells pandas that the Dataframe contains a date column, and that this is the first column
index_col: indicates that the index is also the first column
squeeze: returns a Series instead of a Dataframe

To begin our analysis, we can visualize our time series very simply with the matplotlib.pyplot library, using the following function:

Thanks to this function, we obtain a graph showing the evolution of our variable over time.

Stationarity test

A stationarity test is then performed. The Dickey-Fuller test gives good results quickly and is already implemented in Python in the Statsmodels library.

By importing the function and using it in the following way, we can determine whether our time series can be modeled by an ARMA process:

Here, the terms p and q correspond respectively to the first and last digits of the function’s order argument. The fit method added at the end is used to train the model to determine its parameters on its own.

The summary method is used to check that our model is correct, displaying :

This table may look impressive, but it’s actually very simple to interpret. The column circled in purple corresponds to the model parameters, and the blue column gives the p-value of each parameter. Here we can see that the parameters are good, as the p-value is always less than 0.05. If this is not the case, you can modify p and q to remove unnecessary parameters.

To access and visualize the results, click on :

Our model is shown in red and the actual values in blue:

Conclusion

In conclusion, we’ve succeeded in modeling our time series, but our model does have some limitations, as discussed above. To deal with non-stationary time series, we can use the ARIMA model, which adds differentiation. If our series shows seasonality, i.e. variations at regular time intervals, we’ll need to use the SARIMA model instead. These models and more are covered in our Data Scientist curriculum.

DataScientest News

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

Data Marketing & AI

MLOps

ETL Developer

Data Ops Engineer

Amazon Web Services (AWS)

Microsoft Power BI

Time series plot Python: The Tutorial

A time series is an array of data showing the evolution of a variable over time. In Python, this is often processed in the form of a Series Pandas indexed by a DateTime. This format makes for easy processing and visualization.

The ARMA process

AR

MA

ARMA

Limitations of the model

Python-based study

First analysis

Stationarity test

Conclusion

You are not available?

Related articles

How to Build an effective dashboard?

SAP Fieldglass: Manage your suppliers effectively with this ERP

Hyperautomation: definition, challenges, concrete examples

SAP IBP: What is it? How does it work?

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

Data Marketing & AI

MLOps

ETL Developer

Data Ops Engineer

Amazon Web Services (AWS)

Microsoft Power BI

Time series plot Python: The Tutorial

A time series is an array of data showing the evolution of a variable over time. In Python, this is often processed in the form of a Series Pandas indexed by a DateTime. This format makes for easy processing and visualization.

The ARMA process

AR

MA

ARMA

Limitations of the model

Python-based study

First analysis

Stationarity test

Conclusion

You are not available?

Related articles

How to Build an effective dashboard?

SAP Fieldglass: Manage your suppliers effectively with this ERP

Hyperautomation: definition, challenges, concrete examples

SAP IBP: What is it? How does it work?

DataNews