Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Next time you are caching a bit list with memcached and running out of space, you may replace that big list with a bloom filter.


Or, more likely, replace it with a bloom filter and a database, unless the cost of misidentifying false positives is very low for your application


Could you explain that a bit more?


This is exactly the scenario that a bloom filter is for.

You have an expensive lookup. You're caching information on success/fail so that you don't have to do the expensive lookup every time. But the caches are getting large.

What you do is replace the local caches with a bloomfilter. That data structure takes a bounded amount of memory. When it says, "No, I have not seen you before," you really haven't. And when it says, "I might recognize you," it is only sometimes right. However its mistakes will not really matter because you'll do the expensive lookup.

The tradeoff is that the more data you put into a bloom filter, the higher the odds are that it will think think you might have seen things before, and therefore the less useful it becomes. But in this caching situation, it saves you work even if the false positive rate is fairly high.


Probably.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: