Disjoint Sets

The ADT of disjoint sets is like:

public interface DisjointSets {
    void connect(int p, int q);
    boolean isConnected(int p, int q); 
}

In this interface, p and q represent the p-th and q-th items, respectively.

Implementation

This section introduces some variants of implementations of disjoint sets.

The initial idea for implementing disjoint sets is to maintain a list of sets, e.g, List<Set<Integer>>. For the connect(item_1, item_2) operation, we have to iterate through up to NN sets to find item_1 and item_2, the time complexity of which is O(N)O(N). It is also O(N)O(N) to perform isConnected(item_1, item_2).

Quick Find

In this implementation, the disjoint sets are represented by an array id, where id[x] is the ID of the set to which item x belongs.

int[] id = {4, 4, 4, 5, 4, 5, 6};

/* Need to iterate through the array => Θ(N) */
public void connect(int p, int q) {
  int pid = id[p];
  int qid = id[q];
  for (int i = 0; i < id.length; i++){
    if (id[i] == pid){
      id[i] = qid;
    }
  }
}

Quick find means that we can quickly find out if two items are in the same set.

/* Θ(1) */
public boolean isConnected(int p, int q){
  return (id[p] == id[q]);
}

The following implementations manage the disjoint sets with an array of trees.

Quick Union

Assign each item the index of its parent. If an item has no parent, then it is a 'root' and we assign it a negative value.

To connect two items, we find the set that each item belongs to (the roots of their respective trees) with a helper method find and make one the child of the other.

connect and isConnect are both O(N)O(N) in this case. It seems that QuickUnion is worse than QuickFind, where the isConnect is O(1)O(1). However, O(N)O(N) indicates an upper bound and the worst case is depicted below. When our trees are balanced, both connect and isConnected perform reasonably well.

Weighted Quick Union

New rule: whenever we call connect, we always link the root of the smaller (in terms of the number of nodes) tree to the larger tree.

Following the above rule ensures that the maximum height of any tree is Θ(logN)\Theta (\log N). NN is the number of elements in our Disjoint sets. By extension, the runtimes of connect and isConnected are bounded by O(logN)O(\log N).

We can store the size information in the root of the tree by replacing the -1's with -(size of tree).

Imagine any element xx in tree T1T1. The depth of xx increases by 1 only when T1T1 is placed below another tree T2T2. When that happens, the size of the resulting tree will be at least double the size of T1T1 because size(T2)size(T1)\text{size}(T2)\geq\text{size}(T1). The tree with xx can double at most log2N\log_2{N} times until we've reached a total of NN items (2log2N2^{\log_2{N}}). So we can double up to log2N\log_2N times and each time, our tree adds a level → maximum log2N\log_2N levels.

Weighted Quick Union with Path Compression

Observation: whenever we call find(x) (to find the ancestor of x) we have to traverse the path from x to root. Along the way we can connect all the items we visit to their root at no extra asymptotic cost.

Now the complexity of connect and isConnect is O(logN)O(\log ^* N) and that is pretty close to constant.

log(n)\log* (n) is known as "Iterated logarithm".

log(n)=# of log s.t. log(log(log(log(n))))=0\log*(n) = \# \text{ of } \log \text{ s.t. } \log(\log(\log\dots(\log(n)))) = 0

Last updated