Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Store a hash of each file you already have. If too many items from a single website collide, throw up a warning/error/flag for help/use some fuzzy method of identifying clashing URLs.

If that's too slow/space-intensive, try a bloom filter.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: