Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This analysis leaves out a critical factor, which is that the distributed system will start re-replicating each partition once nodes are declared to be lost. Clearly, if this can be completed before any other nodes fail, no data will be lost. This is the reason why you want more partitions-- not less as this essay recommends. With more, smaller partitions, you can re-replicate quicker.

In practice though, it's not uncommon for multiple machines to fail at once: maybe you lose power to a rack, or a network switch dies. In this case your partitioning scheme really does matter if you want to minimize the odds of data loss. Using a smaller number of larger partitions can decrease the odds of data loss (though, at the expense of losing a larger amount of data if/when you do lose all N replicas of a partition).

Using many smaller partitions can indeed speed recovery, and sometimes this is the right tradeoff to make. But, suppose any amount of data loss means you have to take your cluster offline and rebuild your entire dataset from scratch: in this case, you may be better off with a small number of partitions per node.



That's effectively a partition, not data loss. It is good to be rack-aware though for data placement to still be available in the case of rack loss.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: