An abstract data type for online building and testing of an equivalence relation.
Union find supports 2 operations of a "world" of objects (their identity is unimportant). I use the connected component (graph theoretic) formulation here; it's easy to rephrase with equivalence relations
- Add an edge (undirected) to the graph between A and B.
- Find the connected component to which A belongs. That is, return an object which "represents" the component of A, in the sense that if B also belongs to the component of A then find(A)==find(B).
This data type is "online": you may freely mix the operations "union" and "find", and each "find" query
will return an answer correct at the time of issuing.
Union find answers questions of the form "do I have a path from A to B in my graph right now?" very quickly. Storing the graph normally, such a query takes time linear in the number of objects. With union find, it takes the time to perform 2 finds (plus the time to check equality, typically taken to be O(1)).
Tarjan's algorithm union find with path compression achieves nearly constant amortized time complexity. The time to perform any sequence of n "union" or "find" operations is O(nβ(n)), where β(n) is a (known) function growing slower than the inverse of Ack(k,.) for any value of k. This is very slow indeed (slower than log or log*); a value of n for which β(n) > 6β(1) is unimaginably large!
A less optimal algorithm takes time O(n log n) for the same n operations, which is not nearly as good. However, if coded correctly for an application which performs many "find" operations, it may be faster!