Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I hate to be the one to point out the obvious, but replication isn't a backup. Its for resiliency just like RAID, the two aren't the same.


Replication to another machine that has a COW file system with snapshots is backup though :-)

We backup our data storage for an entire HPC cluster, about 2 PiB of it to a single machine with a 4 disk shelves running ZFS with snapshots. It works very well. Simple raunchy every night, and snapshotted.

We use the backup as a sort of Time Machine should we need data from the past that we deleted in the primary. Plus, we don’t need to wait for the tapes to load or anything.. it is pretty fast and intuitive


The person you're replying to said "Syncthing ... and ZFS + znapzend + rsync.net" though. You're ignoring the rsync.net part.

I have something similar; it's Nextcloud + restic to AWS S3, but it's the same principle. You can give people the convenience and human-comprehensibility of sync-based sharing, but also back that up too, for the best of both worlds. Though in my case the odds of me needing "previous versions" of things approach zero and a full sync is fairly close to backup, but, even so I do have a full solution here.


Syncthing has file versioning but I don't know for sure if it's suitable for backup. https://docs.syncthing.net/users/versioning.html


When I mentioned de-duping and append-only logs, I had this in mind. It's hard to imagine implementing a backup system with those two properties that don't include snapshotting nearly by design-necessity.

(Beyond even the fact that ~/code is also on a ZFS volume that is snapshotted and replicated off-site, which I argue can be used in all of the same important ways any other "backup" is used.)

Hence the comment! After all this blockchain hoopla and everyone's understanding of how "cool" Git is, we really, really deserve better in our backup tools.


But, it makes things easy. I have e.g. a home computer, a server in the closet thing, a laptop and a work computer all with a shared Syncthing folder.

So to bolster that other thing, I just have a simple bash script that reminds me every 7 days to make a copy of that folder somewhere else on that machine. It's not precise because I often don't know what machine I will be using, but that creates a natural staggering that I figure should be sufficient of something goes weird and lose something; like I'm likely to have an old copy somewhere?


What is the actual difference between a backup and replication? If the 1’s and 0’s are replicated to a different host, is that any different than “backing up” (replicating them) to a piece of external media?


> What is the actual difference between a backup and replication?

Simplest way to think about it is that a backup must be an immutable snapshot in time. Any changes and deletions which happen after that point in time will never reflect back onto the backup.

That way, any files you accidentaly delete or corrupt (or other unwanted changes, like ransomware encrypting them for you) can be recovered by going back to the backup.

Replication is very different, you intentionally want all ongoing changes to replicate to the multiple copies for availability. But it means that unwanted changes or data corruption happily replicates to all the copies so now all of them are corrupt. That's when you reach for the most recent backup.

That's why you always need to backup and you'll usually want to replicate as well.


When those 1s and 0s are deleted and that delete is replicated (or other catastrophic change, such as ransomware) you presumably don't have the ability to restore if all you're doing is replication. A strategy that layers replication + backup/versioning is the goal.


I'll add that _usually_ a backup strategy includes generational backups of some kind. That is daily, weekly, monthly, etc to hedge against individually impacted files as mentioned.

Ideally there is also an offsite and inaccessible from the source component to this strategy. Usually this level of robustness isn't present in a "replication" setup.


Put more simply, backups account for and mitigate the common risks to data during storage while minimizing costs, ransomware is one of those common risks. Its organizational dependent based on costs and available budget so it varies.

Long term storage usually has some form of Forward Error Correction (FEC) protection schemes (for bitrot), and often backups are segmented which may be a mix of full and iterative, or delta backups (to mitigate cost) with corresponding offline components (for ransomware resiliency), but that too is very dependent on the environment as well as the strategy being used for data minimization.

> Usually this level of robustness isn't present in a "replication" setup.

Exactly, and thinking about replication as a backup often also gives those using it a false sense of security in any BC/DR situations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: