DSA: Union-Find in Real Systems
Solution to Last Week's Challenge
Last week's problem was: DSA: Heap vs Quickselect for Top-K
The optimal solution uses dsa: heap vs quickselect for top-k pattern with careful state management.
Complexity Analysis:
- Time Complexity: O(n)
- Space Complexity: O(1)
Union-Find, or Disjoint Set Union (DSU), is a cornerstone of graph theory and a frequent visitor in high-level engineering interviews. While many candidates default to Breadth-First Search (BFS) or Depth-First Search (DFS) for connectivity problems, Union-Find offers a more specialized, often more efficient approach for dynamic connectivity—scenarios where the graph structure changes over time as new edges are added. In the world of real-time systems, this is the difference between re-scanning an entire social network to see if two people are connected and simply checking a pre-computed representative element.
In a production environment, Union-Find shines in cluster analysis, image segmentation, and network routing protocols. For example, when building a distributed system, you might use Union-Find to track which nodes belong to which partitions or to detect cycles in a spanning tree for high-availability network design. Mastering this structure isn't just about passing the interview; it's about understanding how to manage partitioned data sets with near-constant time complexity.
Theory
The Disjoint Set Union (DSU) data structure maintains a collection of non-overlapping sets. Each set is identified by a "representative" or "root" element. The two primary operations are:
- Find: Determine which set a particular element belongs to.
- Union: Merge two sets into a single set.
Without optimization, these operations can degrade to $O(N)$ time. However, two key optimizations make Union-Find incredibly powerful:
- Path Compression: During a
findoperation, we make every node on the path point directly to the root. This flattens the tree structure significantly. - Union by Rank/Size: We always attach the smaller tree under the root of the larger tree, preventing the creation of long, skewed chains.
With both optimizations, the amortized time complexity per operation becomes $O(\alpha(n))$, where $\alpha$ is the Inverse Ackermann function. For all practical values of $n$ (up to $10^{600}$), $\alpha(n)$ is less than 5, making it effectively $O(1)$.
Pattern Visualization
When solving a Union-Find problem, the logic typically follows a specific flow: initializing the sets, iterating through relationships, and performing union operations while checking for pre-existing connectivity.
Implementation
Below is a robust implementation in Python, demonstrating the transition from a naive approach to a fully optimized version.
# Optimized Union-Find with Path Compression and Union by Rank
class UnionFind:
def __init__(self, size):
# Space Complexity: O(N) to store parent and rank arrays
self.parent = [i for i in range(size)]
self.rank = [1] * size
self.num_components = size
def find(self, i):
# Path Compression: O(alpha(N)) amortized
if self.parent[i] == i:
return i
self.parent[i] = self.find(self.parent[i])
return self.parent[i]
def union(self, i, j):
# Union by Rank: O(alpha(N)) amortized
root_i = self.find(i)
root_j = self.find(j)
if root_i != root_j:
if self.rank[root_i] > self.rank[root_j]:
self.parent[root_j] = root_i
elif self.rank[root_i] < self.rank[root_j]:
self.parent[root_i] = root_j
else:
self.parent[root_i] = root_j
self.rank[root_j] += 1
self.num_components -= 1
return True
return False # Already in the same set
# Example Usage: Detecting a cycle in an undirected graph
# Time Complexity: O(E * alpha(V)) where E is edges and V is vertices
def has_cycle(n, edges):
uf = UnionFind(n)
for u, v in edges:
if not uf.union(u, v):
return True # Found an edge between two nodes already in the same set
return FalseComplexity Analysis Table
| Strategy | Find Complexity | Union Complexity | Best Use Case |
|---|---|---|---|
| Naive | $O(N)$ | $O(N)$ | Educational purposes only |
| Path Compression Only | $O(\log N)$ | $O(\log N)$ | Simple implementations |
| Union by Rank Only | $O(\log N)$ | $O(\log N)$ | When recursion depth is a concern |
| Optimized (Both) | $O(\alpha(N))$ | $O(\alpha(N))$ | Production systems and FAANG interviews |
Common Patterns
1. Dynamic Connectivity
Used when you need to answer "Are A and B connected?" while simultaneously adding connections.
# Check if two users are in the same social circle
if uf.find(user_a) == uf.find(user_b):
print("Connected")2. Kruskal’s Minimum Spanning Tree
Sorting edges by weight and using Union-Find to ensure no cycles are formed while connecting all nodes.
edges.sort(key=lambda x: x.weight)
for u, v, weight in edges:
if uf.union(u, v):
mst_weight += weight3. Counting Connected Components
The num_components variable in the UnionFind class tracks the number of isolated groups automatically.
Practice Problems
To master Union-Find, focus on these problems categorized by their occurrence in interviews and technical difficulty.
This Week's Interview Challenge
Problem: Dynamic Network Reliability
You are an engineer at a cloud provider. You are given n servers (labeled 0 to n-1) and a series of logs. Each log is a tuple (timestamp, server_a, server_b) representing a new high-speed cable connected between two servers at a specific time.
Your task: Find the earliest timestamp at which all servers are connected into a single network. If the servers are never fully connected, return -1.
Examples:
-
Input:
n = 4,logs = [[10, 0, 1], [20, 1, 2], [30, 2, 3]] -
Output:
30 -
Explanation: At
t=10, {0,1} are connected. Att=20, {0,1,2} are connected. Att=30, {0,1,2,3} are connected. -
Input:
n = 3,logs = [[5, 0, 1], [12, 0, 1]] -
Output:
-1
Constraints:
- $2 \le n \le 10^4$
- $1 \le logs.length \le 10^5$
- $0 \le timestamp \le 10^9$
- Logs are not necessarily sorted by timestamp.
Hints:
- How can you track the number of disconnected components efficiently?
- Does the order of logs matter for finding the earliest time?
- What happens to the component count when you successfully
uniontwo previously separate sets?
Template:
def earliest_connectivity_time(n: int, logs: list[list[int]]) -> int:
# Your code here
passSubmit your solution and check back next week for the detailed answer!
Conclusion
Union-Find is more than just a graph algorithm; it is an exercise in optimization. When an interviewer asks about connectivity, your first thought should be DFS/BFS, but your second thought should always be "Can I use Union-Find for better performance?" Remember to always implement Path Compression and Union by Rank together. In a high-pressure interview, forgetting these optimizations can turn an $O(\alpha(N))$ solution into a slow $O(N)$ one.
https://visualgo.net/en/ufds https://algs4.cs.princeton.edu/15uf/ https://en.wikipedia.org/wiki/Disjoint-set_data_structure