Hacker News new | past | comments | ask | show | jobs | submit login

Except that Dropbox dedupe _everything_.

So I suspect what happens is that everybodies bittorrented dvd rip of Avatar on dropbox is deduped and stored once on S3, admittedly encrypted, but all with Dropboxes encryption key and all with the same hash pointing at the same single encrypted instance of the file.




I believe Dropbox uses a method analogous to block-level dedupe. That is, files are split up into smallish chunks and then the chunks are what get "deduplicated". A "file" basically consists of a list of pointers to chunks.

This makes things extra problematic because completely unrelated files might share chunks. Standard file formats may lead to duplicate headers. Or consider a political science textbook that contains a complete copy of the US Constitution, and a file that contains just the US Constitution. One is perfectly legal to distribute freely, the other may not be, but both might share some common blocks, and a federal judge with a shoot-first mentality might craft an order requiring the deletion of those common blocks.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: