Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
TheLoneWolfling
on Sept 30, 2013
|
parent
|
context
|
favorite
| on:
How to write a crawler
Store a hash of each file you already have. If too many items from a single website collide, throw up a warning/error/flag for help/use some fuzzy method of identifying clashing URLs.
If that's too slow/space-intensive, try a bloom filter.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
If that's too slow/space-intensive, try a bloom filter.