Read Wiki, and am definitely moving here. Have 3 Questions

Diagon · 18 August 2022 15:37

I’m moving from Restic, and having studies Borg pretty thoroughly, I realized that this is where I want to be. I’ve read the docs and am left with three burning questions:

What happens when a backup is interrupted? With Restic, I have an atomic rsync that I use, which copies my local backup to remote. If I am to use duplicacy for remote backups, on a laptop that can be powered up and down regularly, with network outages, I need to know how it deals with these issues.
I’m confused why duplicacy backup depends on us running it from the within the repo. Why not flags like perhaps -backup-dir and -snapshot-id to indicate via the command itself, which backup we want?
Is it possible to use duplicacy to make a backup that covers multiple backends eg., if I want to back up 220GB on three backends of 100GB each? Is there some way to do this? In rclone it’s called a “union remote.”

saspus · 18 August 2022 16:39

Nothing noteworthy. Chunks are uploaded first, snapshots last. If you interrupt — next time you do backup some chunks will be already there and will skip upload, effectively “resuming from where it was left off”. There is also possibility that due to changes in the dataset some chunks that were uploaded during interrupted run are no longer needed by the next backup; those will be orphaned and can be cleaned up with occasional prune -exhaustive

You can definitely do that. Look how duplicacy-web invokes duplicacy CLI. The path to repository to backup can be specified in the .duplicacy/preferences file. In a nutshell you initialize temporary duplicacy “working directory” that can have nothing to do with location of your data and/or preferences and specify path to those in the init command. See the doc for init.

You can also initialize duplicacy in a folder with symlinks to other stuff you want backed up. It follows first level symlinks and it is a rather useful way to manage what to backup.

Three separate jobs. There backup jobs to all three remotes or backup job to one remote and two replication jobs (duplicacy copy) from there to two others. The latter copy jobs can run from another offsite server if needed, saving upstream bandwidth

sevimo · 18 August 2022 16:47

I don’t think this is what OP is asking, it’s not triple duplication. You simulate union storage either by splitting up your source set into 3 different subsets with 3 different backups. Alternatively, you can run on top of rclone mounts, it works, though I can’t really recommend that.

saspus · 18 August 2022 16:54

Ah, you are right of course. It’s like mergefs.

You can backup to rclone mount of course, or rclone serve. The only problem with that is local caching, if enabled, with rclone: duplicacy completing backup does not mean that the data has been uploaded to the target. But since duplicacy does not seek in files, you don’t have to enable VFS caching in rclone, and this reduces complexity and improves reliability.

Along the same lines I would not recommend messing with unionfs/mergefs/ or trying to otherwise tie multiple storages together. Pick one storage that works and stick to it. Otherwise you get the weakest link behavior. The perceived cost savings (of say trying to abuse Office365 family subscription) evaporate the minute something goes wrong and you have to triage and recover.

@Diagon, what are you actually trying to accomplish that requires mergefs? Maybe other solutions, like Minio are more appropriate here?

Diagon · 18 August 2022 17:22

Thanks for the details on interrupted backups. That helps. As for having to run backups from the repository …

If the path to the repo is in the ‘.duplicacy/preferences’ directory, and if I put that in ‘$HOME’, does that mean I have to run all backups when I’m located in ‘$HOME’? I’m confused about needing to ‘cd’ to the directory first before doing the backup, and wondering if there’s another way.

Edit: Ok, I believe you’re describing the switch -pref-dir.

Yes, I did see that, thanks.

Diagon · 18 August 2022 17:29

Ya, that was the point!

This was really a more theoretical quesiton about whether I can find some way to combine the roughly 2TB of free cloud storage space that I have access to over multiple accounts, into something actually useful. I probably am not intending on using it, but I was curious. So, I’m looking at MinIO, though the wiki page is not making it totally clear to me what it actually is.

saspus · 18 August 2022 17:29

Yes, the information about what to backup, how, and where to duplicacy expects to find in $PWD/.duplicacy folder.

Another option would have been to pass path to configuration folder to the backup command as an argument, but it’s not supported.

Ultimately, there is little difference between

cd “$HOME” && duplicacy backup and duplicacy backup -would-be-option-path-to-settings “$HOME”

Diagon · 18 August 2022 17:36

Well. This is a minor oddness to duplicacy, but since we’re talking about it, these are different because then I have to go back to my previous working directory. It seems to me this is rather unusual for a backup program, which would usually call for indicating what is to be backed up on the command line, either directly or by pointing to some config, or defaulting to a config in ‘$HOME’ or ‘$HOME/.conf’ or somesuch. Here we have to change directory to have the config recognized.

As I say, I can live with it - and definitely will - but it seems a bit … well, just odd.

Edit: Actually, I can write my own script which will parse a couple of extra flags. That’ll make me 100% happy insead of just 98%.

saspus · 18 August 2022 18:15

Ok, cd “$HOME” && duplicacy backup; cd - then But I do get what you are saying; it approaches some things differently and makes different tradeoffs. It does make sense for it to use ~/.config/duplicacy, unless .duplicacy exists in the current folder – this would be a good feature request. Along with other being-good-OS-citizen related changes, such as not polluting $HOME with transient data and logs on OSes that have dedicated locations for Caches, Logs, Temp files, etc.

Ultimately therefore this…

…is the way to go: use duplicacy as a building block for your custom backup solution. Many here, including myself ended up writing some amount of scripting around it; ultimately the Duplicacy Web is just another shell around duplicacy cli.

saspus · 18 August 2022 18:22

It’s an object storage server with S3 compatible API. The underlying storage can reside on multiple unreliable volumes to facilitate performance boost and error correction. If you had a few isolated HDDs without redundancy – minio coudl have fit the bill of aggregating that unreliable storage into usable S3 accessible service.

Diagon · 18 August 2022 18:42

Excellent idea. Maybe at some point I’ll hop over on github and offer it.

Yes, I see now. It’s nice, but uses a server/ client model that would necessitate my having my own cloud server. I take @sevimo’s admonishment that rclone would likely not be sufficiently stable, but it does nicely fit duplicati's “only file opearations needed” approach.

I think I’m readly to give it a whirl. I appreciate all your input.

sevimo · 18 August 2022 19:52

It’s getting pretty far way from the original topic, but I personally wouldn’t use minio for aggregating local drives. Minio only gives you S3 interface, which limits its usability unless you want to completely dedicate this storage to S3-aware apps such as . I’d just use regular RAID for that purpose, it is simple and you can get block storage that can be used for just about anything. Or if I wanted to be really fancy I’d run ceph which can provide both block level and S3 storage, but that’s an entirely different level of complexity.

saspus · 18 August 2022 20:14

Sure. There are pros and cons to everything. Regular raid does not protect against bit rot, so you would need to step up to BTRFS or ZFS, and that drastically increases hardware requirements for you storage system to get good performance. With minio member “disks” don’t need to be local, don’t need to be reliable, they can be mount points from different servers, you can configure erasure coding with much higher degree of flexibility, etc.

To be clear, I’m not using it myself either, my storage server has a single purposely built ZFS pool with a bunch of RaidZ1 virtual devices, and other goodness, running on a hardware appropriately sized for all other tasks, for the reasons you describe – S3 is not its only job. However if I was building a server to target duplicacy (or any other backup solution for that matter) backups out of leftover hardware and zoo of old disks – Minio would be my first choice: both due to its portability as well as performance (such as server side hashing, which yes, you can get with sftp as well)… Indeed it gets offtopic quickly

Diagon · 25 August 2022 20:29

Ended up posting a request on github, here.