Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is so spot on.

It's been some years I've been thinking that given that nowadays it doesn't take much effort to scan all the internet (masscan, etc), it might not be hard to find all trackers and then crawl all the DHT.

I'm not sure if this is feasible, but it might be an interesting start.



I've played around a bit with DHT indexing recently and a very simple python program using libtorrent to send sample_infohashes (BEP51) and download metadata (to get names/files) was enough to get me 1-2 .torrent files per second without any special effort or aggressive settings. The bottleneck (by 10x) has been the embarrassingly parallel info hash to .torrent step, so speeding things up shouldn't be very hard.

After running it sporadically for a few months I ended up with 1.4M torrent names and 30M info hashes, but I never put any work into estimating the size of the DHT, so I don't know what sort of coverage that represents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: