Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am downloading this dataset. It will take about 40 days, is my guess....


How would one go about creating a torrent for it? Or uploading to IPFS?


I already emailed the creator a couple weeks back to request / offer a torrent, but haven't heard anything back.

The problem here is that both of your suggestions involve a 2-step process:

1. Download the file

2. Create a torrent from it, or upload it to IPFS

Since step 1 is already a 2TB download, getting to either version of step 2 is untenable. I agree with one of the other posters in this thread, the default for something like this should be torrent since you get both distribution and checksumming for free.

It would also be nice if it wasn't a 2TB zip file, which then has to be unzipped onto another 2TB of storage for practical use.


Subject to licensing, we intend to make the dataset available (along with loads of other big datasets for ML) using a bit-torrent like program called Dela for the Hops Hadoop platform. Maybe in 3 weeks or so, it will all be released - with this dataset. Dela integrates with HDFS/S3/GCS backends, and it supports NAT traversal, and a delay-based congestion control over UDP - good for high bandwidth/high latency networks. See http://www.hops.io and our paper - https://ieeexplore.ieee.org/document/7980225/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: