Bloom Filter - Everything2.com

An algorithm invented by Burton Bloom for querying whether an object is in a set when the full non-sparse index is too big to be quickly accessible (for example, it doesn't fit into working memory and has to be loaded from disk in parts).

You need a set of hashing functions H_n, and a vector of bits M[n]. The hashing function should return a number smaller than the size of the bit vector (i.e. it can be used as an index into it).

To add an object to the set, you generate H_n(Object) for all n, and for each outcome set the bit at that index to 1.

To query whether an object is in the set, you generate H_n(Object) for all n, and for each outcome check the bit at that index. If any of these are 0, the object is not in the set.

This is most used for caches, as it doesn't generate any false negatives (if you get a negative from the algorithm you can safely say you don't have the data) while the false positives it gives (which, with the appropriate number of hashing functions and appropriate size of M[n] will be quite small) will be detected when trying to retrieve the actual content.

Note that you don't usally need to actually hash the data (which is quite computationally expensive) for every hashing function, you can generate a much larger hash and split it into chunks, or generate just one hash and use it as a seed for for instance a linear congruential PRNG.

algorithm	set	Eratosthenes' Sieve	softlink
Linear congruential PRNG	cache	Orlando Bloom	Bloomin' Onion
data structure	vector	interpolation search	minimal criminal
paint ( Graphics G ) in Java	IRC Ban syntax	Working memory	hash function
Radix Sort	Sparse	syntax highlighting	100% Pure Java