Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I maintain that if you want to think of how a bloom filter works, think of it as the front desk staff at an office building. You can ask a series of questions such as "Has anyone wearing a hat come in?" "Any exceptionally tall people?" etc... It won't help answer if Peter from accounting is there, but can help give a good indication if you know enough about him today.

Obviously, if this is not a correct way to think of it, I'm open for more correct ways.



Not really, it's more like a guest list with only the first two letters of guests last names.

If someone shows you might learn they are not a guest, but having the 'correct' "La" letters does not mean they are a guest.

The advantage is a doorman can have a tiny list to direct people inside, the downside is it's poor validation and does not scale very far.

PS: You can obviously add more letters, but the same problem happens if two people have the same names.


This is just one of the hashes, though. If you consider a function of "color of shirt someone is wearing" to "are they here". Then "color of shirt" is now part of the filter. Same for "is wearing a hat", "is exceptionally tall", etc.

So, if you know that Peter is wearing a green shirt today, then you can rule out his being there if the guard says nobody wearing a green shirt has arrived.

Is this a rigorous model? No. It is just an easy mental model for me to conceptualize this.


Except the hash format needs to be decided ahead of time. If you ask a guard if they had seen someone with a peg leg they may remember this even without any other instruction.

If you really want to use a range of questions then police codes is a good proxy. The guard may say we have people in the drunk tank, but not anything else.


Yes, this is why I typically stick with easy and likely to be considered questions.

Again, not a rigorous explanation. Just an easy one. For the advanced algo, I typically word it as imagine if the front desk had a notesheet that they would tick next to "person carrying a backpack" and/or "female" and/or "male" and then I could run my candidate through the questions related to them and see if they all have non-zero quantities of people.

But, again, just a mental way of thinking about this in analogy. If you aren't the kind that needs/wants analogies, this is worthless. On that I fully agree.


That's a beautifully simple analogy.


Imagine a security guard who can't list from memory everyone who is in the building, but if you show him a picture of Peter from accounting, he can with certainty tell you that he hasn't seen Peter yet.

Not sure a bloom filter needs an analogy but there you go.


In this example, you have dropped it down to a single function in lookup. Namely, "is the person in this picture in the building?"

Now, if you are saying the guard internalizes this with "has someone with that hair color entered? What about of that height? With that facial hair? etc. Then, yes, this is the same.

Analogies can almost always help, for the class of people that are helped by analogies. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: