XGBoost stands for eXtreme Gradient Boosting. As its name suggests, it is a Gradient Boosting algorithm. It is coded in C++ and available in just about every programming language useful in Machine Learning, such as Python, R or Julia.
What is Gradient Boosting?
Gradient boosting is a special kind of boosting algorithm.
Boosting consists of assembling several “weak learners” to make a “strong learner”, i.e. assembling several algorithms with low performance to create a much more efficient and satisfactory one. Weak learners are assembled into strong learners by successively calling on them to estimate a variable of interest.
In the case of regression, the principle is to estimate our outputs using model 1, then use the residuals from this model as the target variable for model 2, and so on:
In order to predict an output as a function of an input for which the target variable is unknown, the residual of each model must be predicted and then summed:
As part of a classification, each individual has a weight which will be the same at the outset, and which, if a model is wrong, will be increased before estimating the next model (which will therefore take these weights into account):
The special feature of Gradient Boosting is that in classification, the updating of weights is calculated in the same way as the stochastic gradient descent, and in regression, the global cost function also has the same structure as the stochastic gradient descent.
Gradient boosting is most often used with Decision Tree algorithms, which in this context are considered to be “weak learners”.
Specific features of XGBoost
The main difference between XGBoost and other implementations of the Gradient Boosting method is that XGBoost is computer-optimised to make the various calculations required to apply Gradient Boosting fast. More specifically, XGBoost processes data in several compressed blocks, enabling it to be sorted much more quickly and processed in parallel.
But the advantages of XGBoost are not only linked to the implementation of the algorithm, and therefore its performance, but also to the various parameters it offers. XGBoost offers a wide range of hyperparameters, giving you total control over the implementation of Gradient Boosting. It is also possible to add different regularisations to the loss function, limiting a phenomenon that occurs quite often when using Gradient Boosting algorithms: overfitting.
That’s why XGBoost is often the winning algorithm in Kaggle competitions: it’s fast, accurate and efficient, offering a degree of flexibility never seen before in Gradient Boosting. Finally, remember that since Gradient Boosting is mainly used to improve weak models, XGBoost will almost always perform better than its basic weak model.
If you would like to find out more about XGBoost, Gradient Boosting or our Machine Learning modules, please contact our teams.