Disjoint Set Union (DSU) – Theory and Basic Operations
In this lesson, we will explore the Disjoint Set Union (DSU) data structure, also known as Union-Find. It is used to efficiently keep track of elements that are divided into disjoint (non-overlapping) subsets.
This structure is especially important in graph algorithms, particularly in Kruskal’s algorithm for finding the Minimum Spanning Tree (MST).
Basic Idea
The DSU structure supports two main operations:
find(x)– finds the representative (root) of the set that elementxbelongs to.unite(a, b)– merges (unites) the sets that contain elementsaandb.
Each element has a parent pointer. If an element points to itself, it is the root of its set. When merging two sets, we connect one root to another. To make operations faster, we use two key optimizations:
- Path compression – shortens the path to the root when calling
find(). - Union by size/rank – attaches the smaller tree to the larger one.
DSU Implementation
#include <iostream>
#include <vector>
using namespace std;
struct DSU {
vector<int> parent, size;
DSU(int n) {
parent.resize(n);
size.resize(n, 1);
for (int i = 0; i < n; i++)
parent[i] = i;
}
int find(int x) {
if (x == parent[x])
return x;
return parent[x] = find(parent[x]); // path compression
}
void unite(int a, int b) {
a = find(a);
b = find(b);
if (a != b) {
if (size[a] < size[b])
swap(a, b);
parent[b] = a;
size[a] += size[b];
}
}
};
int main() {
DSU dsu(5);
dsu.unite(0, 1);
dsu.unite(3, 4);
cout << "Find(1): " << dsu.find(1) << endl;
cout << "Find(4): " << dsu.find(4) << endl;
dsu.unite(1, 3);
cout << "Find(0) after merging: " << dsu.find(0) << endl;
return 0;
}
Kruskal’s Algorithm and Its Connection to DSU
Kruskal’s algorithm is used to find the Minimum Spanning Tree (MST) of a connected, undirected, and weighted graph. The idea of the algorithm is as follows:
- Sort all graph edges in increasing order of their weight.
- Iterate through the edges and add an edge to the MST if it does not form a cycle.
- Use the
DSUstructure to check whether adding an edge would form a cycle.
If the endpoints of an edge belong to different sets (i.e., find(u) != find(v)),
the edge is added to the MST and the sets are merged using unite(u, v).
Implementation of Kruskal’s Algorithm
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
struct Edge {
int u, v, w;
};
struct DSU {
vector<int> parent, size;
DSU(int n) {
parent.resize(n);
size.resize(n, 1);
for (int i = 0; i < n; i++)
parent[i] = i;
}
int find(int x) {
if (x == parent[x]) return x;
return parent[x] = find(parent[x]);
}
void unite(int a, int b) {
a = find(a);
b = find(b);
if (a != b) {
if (size[a] < size[b]) swap(a, b);
parent[b] = a;
size[a] += size[b];
}
}
};
int main() {
int n = 4;
vector<Edge> edges = {
{0, 1, 10},
{0, 2, 6},
{0, 3, 5},
{1, 3, 15},
{2, 3, 4}
};
sort(edges.begin(), edges.end(), [](Edge a, Edge b) {
return a.w < b.w;
});
DSU dsu(n);
int mst_weight = 0;
cout << "Edges in the Minimum Spanning Tree:\n";
for (auto e : edges) {
if (dsu.find(e.u) != dsu.find(e.v)) {
dsu.unite(e.u, e.v);
mst_weight += e.w;
cout << e.u << " - " << e.v << " (" << e.w << ")\n";
}
}
cout << "Total weight of MST: " << mst_weight << endl;
return 0;
}
Conclusion
The Disjoint Set Union structure provides an efficient way to merge sets and check whether two elements belong to the same set, which is crucial for many graph algorithms. Combined with Kruskal’s algorithm, DSU allows constructing a Minimum Spanning Tree efficiently – in nearly linear time with respect to the number of edges.
Visualization of DSU Operations (Step by Step)
Understanding how the Disjoint Set Union (DSU) structure changes internally is crucial for beginners.
Let’s take a small example and visualize how elements are grouped and how path compression works.
Example: Merging sets step by step
Suppose we have 6 elements: 0, 1, 2, 3, 4, 5.
Initially, every element is its own parent:
// Initial state (each node is its own parent):
Parent: [0, 1, 2, 3, 4, 5]
Now let’s unite some elements:
unite(0, 1) → Merge sets containing 0 and 1
unite(2, 3) → Merge sets containing 2 and 3
unite(4, 5) → Merge sets containing 4 and 5
After these operations, the parent array might look like this:
Parent: [0, 0, 2, 2, 4, 4]
This means:
- Elements 0 and 1 belong to the same group (root = 0)
- Elements 2 and 3 belong to the same group (root = 2)
- Elements 4 and 5 belong to the same group (root = 4)
Merging larger groups
Now, let’s merge unite(1, 3).
Before the merge:
find(1) = 0
find(3) = 2
Since they have different roots (0 and 2), DSU merges them:
Parent: [0, 0, 0, 2, 4, 4]
After path compression, elements 2 and 3 will also start pointing directly to 0 when they are accessed:
// After find(3) with path compression:
Parent: [0, 0, 0, 0, 4, 4]
Now, all elements {0, 1, 2, 3} belong to the same set with root 0.
Step-by-step diagram
Here’s a simplified text-based diagram of the merging process:
// Initial state:
0 1 2 3 4 5
| | | | | |
(0) (1) (2) (3) (4) (5)
// After unite(0,1), unite(2,3), unite(4,5):
0 → 1 2 → 3 4 → 5
// After unite(1,3):
0 → 1, 2 → 3, and root(0) connects to root(2)
Final structure:
0
├── 1
├── 2
│ └── 3
4
└── 5
When we later call find(3), path compression will make 3 directly point to 0, flattening the structure:
// After path compression:
0 → {1, 2, 3}
4 → {5}
Why path compression helps
Without path compression, each find() call could traverse a long chain of parents.
With compression, all nodes directly connect to the root, turning the structure almost flat.
This makes subsequent find() and unite() operations nearly constant time.
Interactive intuition (mental model)
You can imagine each group as a small tree. When two trees are merged, one root becomes a child of the other. Path compression simply means: every time you look for the root, you also fix the structure to make future lookups faster.
Visual Explanation of DSU Operations
In Figure 1, you can see a visualization of the Disjoint Set Union (DSU) or Union-Find data structure, which is used to efficiently manage disjoint sets — sets that do not overlap. Each element belongs to exactly one set, and every set has a unique representative (root).
At the beginning, each element is independent and is its own parent. This means that there are as many sets as elements — each element forms a separate tree.
1. Initial State
Initially, all nodes (for example 1, 2, 3, 4, 5, 6, 7, 8) are independent:
1 2 3 4 5 6 7 8
In code, this initialization is usually done as follows:
for (int i = 1; i <= n; i++) parent[i] = i;
At this point, every variable parent[i] points to itself.
2. Union of Sets
When we perform operations union(1, 2) and union(2, 3),
the nodes 1, 2, and 3 become part of the same set.
The root of that set is element 1.
1
/ \
2 3
In memory, this means:
parent[2] = 1 and parent[3] = 1.
3. Forming Multiple Groups
If we then execute union(4, 5) and union(6, 7), we form several independent groups (trees).
Each group has its own root — a representative element.
1 4 6 8
/ \ / \
2 3 5 7
4. Merging Existing Groups
Now, when we call union(3, 4), the DSU structure first finds the roots of elements 3 and 4.
Their roots are 1 and 4, and these two sets are then merged into a single group.
1
/ | \
2 3 4
|
5
Now the elements 1, 2, 3, 4, and 5 all belong to the same set.
5. Path Compression
When calling find(5), the algorithm traverses the path 5 → 4 → 1 and returns 1 as the root.
Then, it updates all nodes along that path so that they directly point to the root.
This process flattens the tree and makes future operations much faster.
// Before path compression: 5 → 4 → 1
// After path compression:
parent[5] = 1;
6. Final State
At the end, we have two main groups:
1 6
/|\ \
2 3 4 7
|
5
Group 1 contains the elements {1, 2, 3, 4, 5}, and group 6 contains {6, 7, 8}.
7. Conclusion
The DSU structure uses trees to represent disjoint sets and allows fast merging and representative finding. Thanks to path compression and union by rank, DSU operations run in nearly constant time.
Figure 1: Visualization of DSU operations (find and union)
Applications of DSU and Implementations in Other Languages
So far, we have presented the Disjoint Set Union (DSU) structure mainly in the context of Kruskal’s algorithm for building a Minimum Spanning Tree (MST). However, DSU is a much more versatile data structure — it is widely used in problems where we need to efficiently track connectivity between elements.
1. Broader Applications of DSU
The DSU structure is used in many algorithmic contexts, such as:
- Dynamic connectivity — when edges are added to a graph over time, and we need to check whether two nodes are connected.
- Cycle detection in undirected graphs — DSU can efficiently detect when a new edge creates a cycle.
- Offline connectivity queries — problems where queries are known in advance and processed in reverse using DSU.
- Community detection and clustering — for grouping elements into connected sets.
- Dynamic graph problems — such as “adding or removing roads” while maintaining connectivity information.
This data structure serves as the foundation for many advanced algorithms in graph theory, networking, and set systems.
2. DSU Pseudocode
Regardless of the programming language, the logic behind DSU can be expressed as follows:
procedure make_set(x):
parent[x] = x
rank[x] = 0
function find(x):
if parent[x] != x:
parent[x] = find(parent[x]) // path compression
return parent[x]
procedure union(x, y):
rootX = find(x)
rootY = find(y)
if rootX == rootY:
return
if rank[rootX] < rank[rootY]:
parent[rootX] = rootY
else if rank[rootX] > rank[rootY]:
parent[rootY] = rootX
else:
parent[rootY] = rootX
rank[rootX] = rank[rootX] + 1
3. DSU Implementation in Python
Here is a simple and readable implementation of DSU in Python:
class DSU:
def __init__(self, n):
self.parent = [i for i in range(n)]
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x]) # path compression
return self.parent[x]
def union(self, x, y):
xr, yr = self.find(x), self.find(y)
if xr == yr:
return
if self.rank[xr] < self.rank[yr]:
self.parent[xr] = yr
elif self.rank[xr] > self.rank[yr]:
self.parent[yr] = xr
else:
self.parent[yr] = xr
self.rank[xr] += 1
4. DSU Implementation in Java
Below is a straightforward implementation of the same logic in Java:
class DSU {
int[] parent, rank;
DSU(int n) {
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) parent[i] = i;
}
int find(int x) {
if (parent[x] != x)
parent[x] = find(parent[x]); // path compression
return parent[x];
}
void union(int x, int y) {
int xr = find(x), yr = find(y);
if (xr == yr) return;
if (rank[xr] < rank[yr]) parent[xr] = yr;
else if (rank[xr] > rank[yr]) parent[yr] = xr;
else {
parent[yr] = xr;
rank[xr]++;
}
}
}
5. Comparing Languages and Conceptual Meaning
Whether you use C++, Python, or Java, the core principles remain the same:
- Each element has a parent and an optional rank that represents tree height.
- find() locates the representative of a set while applying path compression.
- union() merges two sets based on rank or size.
Thanks to these optimizations, DSU can be applied to complex problems where relationships between elements change dynamically — such as dynamic networks, graph algorithms, and clustering systems.

