Blog - Bloom filters

Added on Tuesday, 2010-03-30 18:00 UTC in category Programming
Lately I've been learning C and decided it'd be a good idea to combine this with some programming exercises, like on CodeKata. One of the exercises was about Bloom filters.

A Bloom filter is a probabilistic data structure.

Say what?

Wikipedia explains it best of course, but it comes down to this: testing if an item exists in a Bloom filter may give a false positive. (False negatives don't occur.) The huge advantage however, is that the false positive rate is adjustable, and with a 1% false positive rate, a Bloom filter needs only 9.6 bits per item! This is probably why Google uses it in BigTable.

Another use would be to create a spell checker, like in this CodeKata exercise. Try it out, it's fun to play around with :)