Kruskal's algorithm for optimizing connections within a network is one of the key concepts in Machine Learning. Find out how it works and why it's so useful in everyday life.
What is the Kruskal algorithm?
Designed in 1956 by Joseph Kruskal, Kruskal’s algorithm is used to find a minimum weight spanning tree (MWST) (also known as a minimum spanning tree (MST) or minimum subtending tree (MST)) in a non-oriented, weighted connected graph.
Before going into detail, we need to define the terms below:
- Tree: corresponds to a connected graph with no cycle (or acyclic).
- Connected graph: this means that there is a path to all the vertices of the graph from any other vertex.
- Acyclic graph: this means that the cycle is interrupted to access all vertices.
- Weighted graph: each edge has a weight.
- Undirected graph: each edge can be traversed in either direction.
- Spanning tree: refers to a set of edges in a weighted graph that contains all vertices (but not necessarily all edges). The weight of the spanning tree is the sum of the weights of its edges.
- Minimum weight spanning tree (MWST): this is a spanning tree whose weight is as small as possible. In other words, it is a subset of edges that connects all the nodes in the graph where the sum of the edge weights is as low as possible.
The aim of Kruskal’s algorithm is to find this minimum-weight spanning tree.
Good to know: in addition to Kruskal’s algorithm, you can also use Prim’s algorithm to find an ARPM.
How does the Kruskal algorithm work?
The steps of the Kruskal algorithm
To find the least-weight spanning tree, the Kruskal algorithm provides the following steps:
- Sort the edges (a) of the graph (G) by increasing weight. In doing so, the lightest edges are examined first.
- Start with an empty tree (T). That is, a tree containing no edges; its weight is 0. Its construction is progressive, following these steps.
- Select edges by ascending weight. Start with the lightest edges. If the addition of an edge does not imply the creation of a cycle in the graph, add it to the tree.
- Repeat the operation until all vertices (S) are connected. And this without creating a cycle in the ARPM.
Mathematical translation
In mathematical terms, the Kruskal algorithm translates as follows:
T = ø
Sort the a’s in G by ascending value
F = { ao }
Repeat F = { ao } until all S have been included in T
Where
ao : smallest unexamined edge
T : minimum weight spanning tree
G : graph
S : vertex
p: number of edges already placed in the G graph
n : number of vertices in graph G
F = { ao } :addition of the smallest unexamined edge
Good to know: The number of edges in an ARPM can vary by (n-1)n/2 . The fewer edges the minimum-weight spanning tree contains, the faster the Kruskal algorithm.
Why use the Kruskal algorithm?
In practice, this algorithm is often used to solve practical problems such as the design of low-cost communication networks, cabling planning, the elimination of less profitable sea links while preserving accessibility to different ports, and so on.
Whatever the practical application, Kruskal’s algorithm enables connections to be optimized. This is precisely why it is so useful in Machine Learning. By optimizing the weight of connections, it maximizes the efficiency of machine learning models.
Good to know: you can use Kruskal’s algorithm in a variety of programming languages, such as C++, Python, Java, C#, Javascript, etc.
Deepen your knowledge of Machine Learning with DataScientest
Beyond the Kruskal algorithm, Machine Learning requires mastery of a multitude of mathematical and statistical concepts, not to mention programming languages, Big Data tools, data visualization and more. These are all technical skills that can only be learned through comprehensive training. Like the one offered by DataScientest. Discover our program.