Backup Strategy - advice

snowcrash101 · 17 August 2023 12:29

Hi, apologies, this is a rather basic/generic question, but I realized I may not be following the best kind of backup practice.

I have a homeserver, which I plan to install Duplicacy on, which will send files to backup on cloud storage.

I have a PC, laptops, all which use Syncthing to keep files in sync between machines. As Syncthing also runs on always-on homeserver it acts like a hub, and the latest version of files are always available/saved on the homeserver.

As indicated, my intention is to use Duplicacy to back up the the latest files on the homeserver. Though is this good practice?

Should there be an intermediate step (backups) between syncthing and Duplicacy, otherwise Duplicacy is copying the latest Syncthing files on the homeserver? (What if Syncthing propagates errors; though with the correct file rentention set on Duplicacy that wouldn’t be an issue?)

Note: I want to avoid running Duplicacy on my PC, other laptops.

Any thoughts or suggestions will be appreciated.

Thanks

saspus · 17 August 2023 15:48

Yes, this is excellent approach, with an asterisk.

Duplicacy is a backup program, not sync. This it uploads a snapshot of your data, state it was at the point when the backup was made. So technically, yes.

Being a backup program, duplicacy creates a version history. You chan restore data from any snapshot from the past.

So if you backup a file that got corrupted, you will be able to restore a previous, uncorrupted version. That’s the whole point of backup — to revert data loss.

Asterisk: There is however one unobvious subtlety.

Imagine you have a multi-file project, and it’s important that all files are kept in sync. When duplicacy runs on a host you can (and should) let it to use filesystem snapshots: before making a backup, a snapshot is created that freezes state of entire filesystem in time, and then data from that snapshot is being backed up. Subsequent file changes on disk don’t affect what’s being backed up, all your files in the backup end up in their state at the moment the backup started and filesystem snapshot was taken, regardless of how long it takes to do an actual uploading.

This atomicity is broken if you use sync solution in between. You can imagine a scenario where you saved a project, that resulted in updated files A and B. Sync started, and the backup started approximately the same time, sync managed to updated file A but not file B yet, but duplicacy picked up file A and file B. Now in your backup A and B will be from different states — B from previous save and A from the recent one. If you then restore that revision your project may be inconsistent or plain refuse to open, depending on the app.

If all your files are single independent files — like music, videos, pdfs — then it’s not a problem. But if you backup, say, git repository or some complex cad project — that may bite.

Let’s consider different angle.

Why don’t you want to run duplicacy on the source? If it is bursty CPU usage — there are apps to throttle. You can use Duplicacy CLI, which is free for personal use, and schedule it with your OS’s scheduler (Task Scheduler on windows, launchd on macOS, systemd on Linux, etc)

If you are worrrie about security — not letting some users see backup of other users — you can use rsa encryption. Only public key is required to backup, but to restore you need private key. Duplicacy supports that too.

If you can run duplicacy on each host — then they still can backup to the single repository on your NAS, and another instance of duplicacy on your NAS can then copy (duplicacy copy) already deduplicated backup to the cloud.