I find it very, very hard to go wrong with Syncthing (for stuff I truly need rep...

JeremyNT · on Dec 9, 2022

> Given the number of worktrees I have of some huge repos (nixpkgs, linux, etc) it would likely mark a significant reduction in CPU/disk usage given what Syncthing is having to do now to monitor/rescan as much as I'm asking it to (given it has to dumb-sync .git, syncs gitignored content, etc, etc).

Are you really hitting that much of a resource utilization issue with syncthing though? I use it on lots of small files and git repos and since it uses inotify there's not really much of a problem. I guess the worst case is switching to very different branches frequently, or committing very large (binary?) files where it may need to transfer them twice, but this hasn't been a problem in my own experience.

I'm not sure you could really do a whole lot better than syncthing by being clever, and it strikes me as a lot of effort to optimize for a specific workflow.

Edit: actually, I wonder if you could just exclude the working copies with a clever exclude list in syncthing, such that you'd ONLY grab .git so you wouldn't even need the double transfer/storage. You risk losing uncommitted work I suppose.

wereallterrrist · on Dec 11, 2022

inotify has pretty paltry limits. My ~/code is only 40-50GB but there's no way inotify can watch it all.

Thus, syncthing basically constantly has to rescan. It's not great.

And yes, rebasing linux+nixpkgs on even an hourly basis is absolutely devastating. lol

killingtime74 · on Dec 9, 2022

For code I just use a self hosted git server

_siis · on Dec 9, 2022

I hate to be the one to point out the obvious, but replication isn't a backup. Its for resiliency just like RAID, the two aren't the same.

reacharavindh · on Dec 9, 2022

Replication to another machine that has a COW file system with snapshots is backup though :-)

We backup our data storage for an entire HPC cluster, about 2 PiB of it to a single machine with a 4 disk shelves running ZFS with snapshots. It works very well. Simple raunchy every night, and snapshotted.

We use the backup as a sort of Time Machine should we need data from the past that we deleted in the primary. Plus, we don’t need to wait for the tapes to load or anything.. it is pretty fast and intuitive

jerf · on Dec 9, 2022

The person you're replying to said "Syncthing ... and ZFS + znapzend + rsync.net" though. You're ignoring the rsync.net part.

I have something similar; it's Nextcloud + restic to AWS S3, but it's the same principle. You can give people the convenience and human-comprehensibility of sync-based sharing, but also back that up too, for the best of both worlds. Though in my case the odds of me needing "previous versions" of things approach zero and a full sync is fairly close to backup, but, even so I do have a full solution here.

NelsonMinar · on Dec 9, 2022

Syncthing has file versioning but I don't know for sure if it's suitable for backup. https://docs.syncthing.net/users/versioning.html

wereallterrrist · on Dec 11, 2022

When I mentioned de-duping and append-only logs, I had this in mind. It's hard to imagine implementing a backup system with those two properties that don't include snapshotting nearly by design-necessity.

(Beyond even the fact that ~/code is also on a ZFS volume that is snapshotted and replicated off-site, which I argue can be used in all of the same important ways any other "backup" is used.)

Hence the comment! After all this blockchain hoopla and everyone's understanding of how "cool" Git is, we really, really deserve better in our backup tools.

jrm4 · on Dec 9, 2022

But, it makes things easy. I have e.g. a home computer, a server in the closet thing, a laptop and a work computer all with a shared Syncthing folder.

So to bolster that other thing, I just have a simple bash script that reminds me every 7 days to make a copy of that folder somewhere else on that machine. It's not precise because I often don't know what machine I will be using, but that creates a natural staggering that I figure should be sufficient of something goes weird and lose something; like I'm likely to have an old copy somewhere?

whalesalad · on Dec 9, 2022

What is the actual difference between a backup and replication? If the 1’s and 0’s are replicated to a different host, is that any different than “backing up” (replicating them) to a piece of external media?

jjav · on Dec 9, 2022

> What is the actual difference between a backup and replication?

Simplest way to think about it is that a backup must be an immutable snapshot in time. Any changes and deletions which happen after that point in time will never reflect back onto the backup.

That way, any files you accidentaly delete or corrupt (or other unwanted changes, like ransomware encrypting them for you) can be recovered by going back to the backup.

Replication is very different, you intentionally want all ongoing changes to replicate to the multiple copies for availability. But it means that unwanted changes or data corruption happily replicates to all the copies so now all of them are corrupt. That's when you reach for the most recent backup.

That's why you always need to backup and you'll usually want to replicate as well.

chrishas35 · on Dec 9, 2022

When those 1s and 0s are deleted and that delete is replicated (or other catastrophic change, such as ransomware) you presumably don't have the ability to restore if all you're doing is replication. A strategy that layers replication + backup/versioning is the goal.

natebc · on Dec 9, 2022

I'll add that _usually_ a backup strategy includes generational backups of some kind. That is daily, weekly, monthly, etc to hedge against individually impacted files as mentioned.

Ideally there is also an offsite and inaccessible from the source component to this strategy. Usually this level of robustness isn't present in a "replication" setup.

_siis · on Dec 9, 2022

Put more simply, backups account for and mitigate the common risks to data during storage while minimizing costs, ransomware is one of those common risks. Its organizational dependent based on costs and available budget so it varies.

Long term storage usually has some form of Forward Error Correction (FEC) protection schemes (for bitrot), and often backups are segmented which may be a mix of full and iterative, or delta backups (to mitigate cost) with corresponding offline components (for ransomware resiliency), but that too is very dependent on the environment as well as the strategy being used for data minimization.

> Usually this level of robustness isn't present in a "replication" setup.

Exactly, and thinking about replication as a backup often also gives those using it a false sense of security in any BC/DR situations.

hk1337 · on Dec 9, 2022

I use Syncthing between Mac, Windows (have included Linux in the mix at one point), and with my Synology NAS. Syncthing is more for my short term backup though. I will either commit it to a repo, save it to a Synology share, or delete it.

*edit* my gitea server saves its backups to synology

ww520 · on Dec 9, 2022

Yes. I just let Syncthing sync among devices, using it for creating copies of the backup. The daily backup scripts do their things and create one backup snapshot, then Syncthing picks up the new backup files and propagate them to multiple devices.

acranox · on Dec 9, 2022

Sparkleshare does something kind of similar. It uses git as the backend automatically sync directories on a few computers. https://www.sparkleshare.org/

fncivivue7 · on Dec 9, 2022

Sounds like you want Borg

https://borgbackup.readthedocs.io/en/stable/

My two 80% full 1tb laptops and 1tb desktop backup to around 300-400G after dedupe and compression. Currently have around 12tb of backups stored in that 300G.

Incremental backups run in about 5 mins even against the spinning disk's they're stored on.

0cf8612b2e1e · on Dec 9, 2022

Python programmer here, but I actually prefer Restic [0]. While more or less the same experience, the huge selling point to me is that the backup program is a single executable that can be easily stored alongside the backups. I do not want any dependency/environment issues to assert themselves when restoration is required (which is most likely on a virgin, unconfigured system).

[0] https://restic.net/

SomeoneOnTheWeb · on Dec 9, 2022

You can also take a look at Kopia (https://kopia.io/).

I've been using Borg, Restic and Kopia for a long time and Kopia is my personal favorite - very fast, very efficient, runs in the background automatically without having to schedule a CRON or anything like that.

Only downside is that the backups are made of a HUGE number of files, so when synchronizing it can sometimes take a bit of time to check the ~5k files.

wanderingmind · on Dec 9, 2022

Highly recommend Kopia that has a nice UI and can work with rclone (so any cloud back end)

klodolph · on Dec 9, 2022

I’ve been using Kopia, I recommend it.

wereallterrrist · on Dec 9, 2022

No, I distinctly don't want borg. It doesn't help or solve anything that Syncthing doesn't do. The obsession with borg and bup are pretty baffling to me. We deserve better in this space. (see: Asuran and another who's name I forget...)

Critically, I'm specifically referring to code sync that needs to operate at a git-level to get the huge efficiencies I'm thinking of.

Syncthing, or borg, scanning 8 copies of the Linux kernel is pretty horrific compared to something doing a "git commit && git push" and "git pull --rebase" in the background (over-simplifying the shadow-branch process here for brevity.)

re: 'we deserve better' -- case in point, see Asuran - there's no real reason that sync and backup have to be distinctly different tools. Given chunking and dedupe and append-logs, we really, really deserve better in this tooling space.

formerly_proven · on Dec 9, 2022

borg et al and "git commit" work in essentially the same way. Both scan the entire tree for changes using modification timestamps.

dragonwriter · on Dec 9, 2022

> borg et al and "git commit" work in essentially the same way. Both scan the entire tree for changes using modification timestamps.

But git commit doesn’t do that. If you want to do that in git, you typically do it before commit with “git add -A”.

codethief · on Dec 9, 2022

I don't think GP was talking about backups (which is what Borg is good for) but about synchronization between machines which is another issue entirely.

_dain_ · on Dec 9, 2022

They work together. I use syncthing to keep things synchronized across devices, including to an always-on "master" device that has more storage. Then borg runs on the master device to create backups.