To define a predictive model, data scientists call on multiple observations. But while the study of these observations leads to an optimal result, data scientists often have little time to analyze all the hypotheses.
So how do you find the right model in the shortest possible time? This is where Bayesian optimization comes in. What is it? How does it work? You’ll find the answers here.
What is the Bayesian approach?
Bayesian optimization follows directly from Bayès’ theorem:
Through this theorem, you have a value y that is a function of x. The idea is then to determine the value of x by optimizing the value of y. Here, x is made up of a set of parameters (or observations).
In practical terms, this can be applied in a multitude of situations, such as setting an ideal price to maximize margins, configuring an application or database to maximize its performance, managing parameters to optimize supervised learning, etc.
In all these cases, Data Scientists only have a limited number of observations at their disposal to achieve an optimal result (whether due to time, financial or hardware constraints).
Indeed, to define the best model, it is generally necessary to test numerous hypotheses, carry out several training sessions and validations. But all these testing phases take time. It is therefore not possible to study an unlimited number of hypotheses.
To cope with these constraints, Bayesian optimization was introduced.
How does Bayesian optimization work?
The central idea of Bayesian optimization is to minimize the number of observations while converging rapidly to the optimal solution. To achieve this, we need to be aware of three fundamental principles.
The Gaussian process
The idea of the Bayesian approach is to use known observations to deduce probabilities of events that have not yet been observed. To reach this conclusion, we need to determine the probability distribution for each X value.
The most efficient method is undoubtedly the Gaussian process. This identifies the most probable value (called the mean µ) and the probable dispersion of the value around the mean (called the standard deviation σ). This standard deviation σ weakens the closer you get to a previously observed point.
Ideally, these values and distances should be calculated for each observation point. In practice, however, this exhaustive representation is not possible due to time constraints. So you need to select the points you want to evaluate.
Exploration and exploitation
To design a high-performance predictive model, Data Scientists need to define the most relevant points. This is a two-stage process:
- Exploration: this is of interest when the standard deviation is particularly large. In other words, the unknown variable in the search space is high. This allows us to test several models and improve our knowledge of the function to be optimized.
- Exploitation: at this stage, the aim is to refine the models tested upstream. The idea is to find the maximum point. To do this, Data Scientists use the average µ. If it lies within the extremes, it’s easier to identify the right model.
It’s important to strike the right balance between exploration and exploitation. If you favour exploration, you run the risk of overlooking other, potentially more efficient, models. Conversely, if you favor exploitation, you could overlook necessary improvements.
The acquisition function
The acquisition function helps to find the right compromise between these two variables. For each point in the search space, the function identifies an optimization potential. Among all these points, the function identifies a maximum. This is the next point to be tested. Simply repeat the calculation as many times as necessary until convergence between the maximum and minimum is achieved. This is the pair of parameters that should deliver the best performance.
Good to know: Noise can alter data and make learning more difficult. To avoid this, it is essential to check that the environment is sufficiently stable and the observations reproducible before using Bayesian optimization.
How can Bayesian optimization be put into practice?
The easiest way to simplify Bayesian optimization calculations is to use the right tools. Like the Python package scikit-optimize or bayesian-optimization. All you have to do is define a search space, and the tool will then take care of finding points with high potential, thanks in particular to the Gaussian process. Here again, you’ll need to restart Python until you get a satisfactory result.