Disjoint Sets
The ADT of disjoint sets is like:
In this interface, p
and q
represent the p
-th and q
-th items, respectively.
Implementation
This section introduces some variants of implementations of disjoint sets.
The initial idea for implementing disjoint sets is to maintain a list of sets, e.g, List<Set<Integer>>
. For the connect(item_1, item_2)
operation, we have to iterate through up to sets to find item_1
and item_2
, the time complexity of which is . It is also to perform isConnected(item_1, item_2)
.
Quick Find
In this implementation, the disjoint sets are represented by an array id
, where id[x]
is the ID of the set to which item x
belongs.
Quick find means that we can quickly find out if two items are in the same set.
The following implementations manage the disjoint sets with an array of trees.
Quick Union
Assign each item the index of its parent. If an item has no parent, then it is a 'root' and we assign it a negative value.
To connect two items, we find the set that each item belongs to (the roots of their respective trees) with a helper method find
and make one the child of the other.
connect
and isConnect
are both in this case. It seems that QuickUnion is worse than QuickFind, where the isConnect
is . However, indicates an upper bound and the worst case is depicted below. When our trees are balanced, both connect
and isConnected
perform reasonably well.
Weighted Quick Union
New rule: whenever we call connect
, we always link the root of the smaller (in terms of the number of nodes) tree to the larger tree.
Following the above rule ensures that the maximum height of any tree is . is the number of elements in our Disjoint sets. By extension, the runtimes of connect
and isConnected
are bounded by .
We can store the size information in the root of the tree by replacing the -1
's with -(size of tree)
.
Imagine any element in tree . The depth of increases by 1 only when is placed below another tree . When that happens, the size of the resulting tree will be at least double the size of because . The tree with can double at most times until we've reached a total of items (). So we can double up to times and each time, our tree adds a level → maximum levels.
Weighted Quick Union with Path Compression
Observation: whenever we call find(x)
(to find the ancestor of x
) we have to traverse the path from x
to root. Along the way we can connect all the items we visit to their root at no extra asymptotic cost.
Now the complexity of connect
and isConnect
is and that is pretty close to constant.
is known as "Iterated logarithm".
Last updated