Bloom filter

A Bloom filter is a data structure that allows computers to see if a given element occurs in a set.[1][2] Bloom filters use hash functions to do this. For each element that is added, a hash value is calculated. When a new element is added, its hash value is compared to that of the other elements in the set. A Bloom filter is a probabilistic data structure. It is possible to get a false positive, but not to get a false negative.[3] In other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed.[2] For each added element, the probability of getting a false positive grows.[2]

Edward Bloom proposed the Bloom filter in 1970. In the article, Bloom supposes there is an algorithm to hyphenate words at the end of a line. According to the example, most words have simple hyphenation patterns. But about 10% of the words require time-consuming lookups to fetch the correct rule. His case was that of hyphenating about 500,000 words. He saw that using the "normal" error-free hashing techniques, storing the hyphenation patterns, would require a lot of memory. He found that using his technique, he could eliminate most lookups. For example, a hash area only 15% of the size needed by an ideal error-free hash still eliminates 85% of the disk accesses.

More generally, fewer than 10 bits per element are required for a 1% false positive probability, independent of the size or number of elements in the set.[4]

Bloom Filter Media

References

  1. Lua error in Module:Citation/CS1/Identifiers at line 630: attempt to index field 'known_free_doi_registrants_t' (a nil value).
  2. 2.0 2.1 2.2 Intelligent Data Engineering and Automated Learning - IDEAL 2010: 11th international conference, Paisley, UK, September 1-3, 2010 : proceedings, eds. Colin Fyfe; et al (Berlin: SpringerLink, 2010), p. 162
  3. Rayan Chikhi; Guillaume Rizk, 'Space-efficient and exact de Bruijn graph representation based on a Bloom filter', Biomedical Research, Volume 36 (2014), p. 3 of 9
  4. Lua error in Module:Citation/CS1/Identifiers at line 630: attempt to index field 'known_free_doi_registrants_t' (a nil value).