We have the answers to your questions! - Don't miss our next open house about the data universe!

Survival Analysis: Beyond Machine Learning

- Reading Time: 2 minutes
survival analysis

When starting a Data Science project, it’s crucial to carefully consider the modeling of the problem at hand.

If we aim to increase sales on an e-commerce website, we can work on enhancing conversion rates using a classification model, determining user visit durations based on their profiles, modeling visitor journeys, attributing visitor arrivals to various marketing channels, and improving website SEO.

There are various mathematical approaches available for these tasks:

  • Machine Learning
  • survival analysis
  • Markov chains
  • Shapley value calculations
  • PageRank score estimation
  • and more.

We can see that Machine Learning is not the be-all and end-all of the Data Scientist profession. It’s essential to explore other mathematical models based on probability theory, game theory, or graph theory.

With this in mind, we are going to develop a course on the timely and relevant topic of Survival Analysis.

Survival function experiencing a slight decrease.

What is survival analysis?

Survival analysis is a field of statistics that focuses on the lifespan of individuals within a population. Its aim is to estimate the date at which a death event occurs.

However, its scope of application is much broader:

  1. Predictive maintenance: Estimating the machine failure date to intervene in a timely manner.
  2. Churn analysis: Predicting when a customer will unsubscribe from a service.
  3. Credit analysis: Anticipating when a customer may default on payments.
  4. Epidemiology: Forecasting when a patient will recover (in this case, it’s the virus/bacteria that “dies”).

The use of such models dates back to the 1950s in medicine, but some researchers are working on algorithms that combine this modeling with Machine Learning techniques.

Let's take a closer look at the survival function.

In survival analysis, the goal is to estimate the distribution of a random variable X, which represents the time of an event, such as death. This leads to the introduction of the survival function:

S(t) = P(X>t)

and the instantaneous hazard rate:

We can estimate these quantities using non-parametric estimators like the Kaplan-Meier estimator, semi-parametric methods like the Cox model, or parametric models. The latter two types of estimators are especially useful for assessing the influence of explanatory variables on the survival function.

In survival analysis, the PySurvival library is highly valuable. It is well-referenced, well-documented, and offers a wide range of useful tools for visualization and performance measurement.

Has this article piqued your interest? Know that a training course on this topic will be starting soon! Feel free to reach out to us for more information!

Are you looking to delve deeper into the possibilities of Machine Learning? Start one of our training courses today!

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter

DataNews

Get monthly insider insights from experts directly in your mailbox