Why use genetic algorithm in data science?

The genetic algorithm is often used for optimization problems where there are many possible solutions. That said, it often requires many calculations to achieve a satisfactory result. For this reason, many data scientists are turning to other optimization algorithms with shorter computation times.

Back to articles

Genetic algorithm: Discover the 6 steps

Q: The steps of the genetic algorithm

Creating the initial population: The first step in the genetic algorithm is to create an initial population that will evolve over time. These group together potential solutions to a given problem. Called individuals or chromosomes, they can be generated at random. This allows for greater diversity.

18 Mar 2024

m de lecture

Machine Learning

Melanie

Since the dawn of time, living beings have demonstrated their ability to adapt to their ever-changing environment and improve from generation to generation.

This ability to converge towards increasingly optimal solutions is precisely what inspires mathematics. In particular, genetic algorithms, which use natural selection processes to solve complex problems.

Understanding the steps of the genetic algorithm

Born in the 60s on the initiative of researcher John Holland, the genetic algorithm applies the various stages of natural evolution to the solution of complex problems.

Creating the initial population

The first step in the genetic algorithm is to create an initial population that will evolve over time. These group together potential solutions to a given problem. Called individuals or chromosomes, they can be generated at random. This allows for greater diversity.

At this stage, it’s not a question of finding THE right solution. It’s all about identifying a sufficient number of solutions capable of responding to the problem. In fact, the more varied the initial population, the more likely it is that the best possible solutions can be devised.

For example, if you’re looking for the path between point A and point B, it’s not a question of finding the shortest route, but all the routes to your destination.

It doesn’t matter if you have to spend 10 minutes or 3 hours, on foot or by car, in front of a bakery or a garage.

Good to know: Population size doesn’t have to be excessive. It simply has to be large enough. On average, 100 or 150 individuals will give a satisfactory result at the end of all the steps in the genetic algorithm.

Assessing individuals

Once the population has been created, it’s time to evaluate each individual according to his or her ability to solve the problem. The data scientist is free to evaluate individuals as he sees fit. That said, it’s best to give them a score so as to identify those who stand out from the crowd. They’re the ones who contribute to the improvement of our population.

Beware: this phase of the genetic algorithm can be complex, as it is sometimes difficult to compare two individuals with each other.

This is particularly true for multi-criteria problems, where the optimal solution depends on several parameters, without one being better than the other.

The selection

Once the individuals have been evaluated, the best ones must be selected. In nature, this corresponds to the process of natural selection (or the law of the fittest), whereby the species best adapted to its environment survives, while the others die out.

In mathematics, several methods can be used to select the best.

Here are the most common:

The roulette wheel: as with the lottery, this involves spinning the wheel to select individuals. But be careful, because each individual is associated with a sector on a wheel according to his or her quality. The more “valuable” the individual, the more important the sector. So, as the wheel is turned, the probability of finding the highest-quality individuals increases.
Tournament selection: a random encounter between different individuals in the population. The winner is the one with the best quality. Of course, it is possible to select several winners.
Elitism: simply selects the best individuals on the basis of their quality. While this method is quicker to implement, it does not take into account all possible diversity.

Crossbreeding

If natural species have been able to improve over time, it’s thanks to the enrichment of populations over time. Genetic algorithms use the same principle.

To make the population evolve, the data scientist combines pairs of selected individuals, thus creating new individuals: the offspring. For this cross-breeding, the data expert selects certain characteristics from each parent individual.

Random crosses are possible, and even the same individual can be used to create several offspring. Unused characteristics can be used to generate further offspring.

Mutation

Instead of cross-breeding, the genetic algorithm also provides for mutations. Random mutations are also very effective for maximum diversity and innovative combinations.

Good to know: these are minor modifications. In other words, only one characteristic is changed, so as not to totally alter the original individual. But even sometimes, these small changes can result in a totally different evaluation.

New generations

To find the best solutions to the problem, new generations appear, with offspring, modified individuals and unmodified individuals. Everyone must participate in improving the population.

This is why the data scientist repeats the genetic algorithm process over several generations, in order to continually improve the solution to the problem.

Gradually, the population becomes smaller and smaller, until the optimal solution is found.

Good to know: it’s important to keep some of the initial (unmodified) individuals, as there’s no guarantee that the new populations will necessarily be better.

Using genetic algorithms in data science

Genetic algorithms are often used for optimization problems where there are many possible solutions. That said, it is often computationally intensive to achieve a satisfactory result.

For this reason, many data scientists are turning to other optimization algorithms with shorter computation times.

In any case, it is essential to be familiar with the various machine learning models in order to carry out relevant analyses. Hence the importance of training with DataScientest.

DataScientest News

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!

Data Analyst

Analytics Engineer

Data Scientist

AI / Machine Learning Engineer

Data Engineer

Cloud Engineer

DevOps Engineer

Data Marketing & AI

MLOps

ETL Developer

Data Ops Engineer

Amazon Web Services (AWS)

Microsoft Power BI

Genetic algorithm: Discover the 6 steps

Since the dawn of time, living beings have demonstrated their ability to adapt to their ever-changing environment and improve from generation to generation.