Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anybody have an example of an application where you need the counting/deletion properties of a counting bloom filter or cuckoo filter over a normal bloom filter? If you do, how do you deal with the filter rejecting your writes because a bucket is full?

It seems like for most applications, silently degrading (instead of rejecting the insertion) when the bloom filter is above capacity is a super useful property.



The failure mode doesn't matter much in practice. You need to track how many inserts you've done on both for practical reasons so with either you'll have a set cutoff before rebuild. Cuckoo filters are more likely to be used in place of counting bloom filters than vanilla ones.

You could always avoid duplicate inserts in cuckoo by checking contains before calling insert again. A modified insert-only-once routine would only have a small performance penalty. You can't use counting or deletion while doing this though, so its a trade-off. This same trade-off happens with counting bloom filters but they are much less space efficient.

Practically the use case for cuckoo filters over bloom probably lies in bi directional communication. Partner nodes can keep each other's state updated without needing to exchange a ton of data. Think distributed caches. So two data nodes exchange cuckoo filters of each other's data initially. As things fall out of their caches they can tell each other to delete those items from each other's filters by sending only the bucket index and fingerprint. Probably much smaller than the piece of data that represented originally. Since each data node independently knows what was in their cache there's no risk of false deletions. You can't really use bloom filters for this because you can't delete




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: