Am i doing this right? (2 Storage Locations)

I have 2 storage locations set up in Duplicacy web edition… Google and my local NAS.

Each location backs up the same data, from 10 different mount points on my linux server.

The way i have it set up now, i have to have 20 backup jobs set up… 10 for the NAS and 10 for Google… then allocate each one to the correct schedule…

is that right? I feel like i understand why its this way (so each backup job can be run individually)… but i also feel like i shouldn’t need 2 jobs and i should be able to select the storage in the schedule (i know i’m wrong… but i’m just wanting someone to tell me why :slight_smile: )

Create a folder. Put symlinks to your 10 folders there. Create two jobs to backup that folder to two destinations.

Duplicacy follows first level symlinks.

Would a copy job (from one storage to the other) be appropriate?

That’s what I use with a similar setup (local + remote storage) - individual backup for each data source to local storage, then a copy from local to remote.

I have them ‘copy compatible’ so if either storage fails I can recreate it from the other.

Sure, you can run a copy as well

1 Like

I guess what i’m missing is what a copy job actually does… so i backup to storageA and then instead of another backup job running to storageB, i run a copy job instead which would download and reupload the chunks to storageB?

if thats the case then i deliberately run the 2 jobs at different times so i have multiple restore points… more local than in cloud (cost based)…

For some reason, and maybe i’m thinking about this wrongly… i want to keep each of my folders in seperate backups…

however maybe you’re right… i’m not sure why i’m doing it this way seeing as they all run on the same schedule (per storage point)…

it would however been a chore to change things now… i assume i’d have to re-backup everything…?

First, some duplicacy terminology:

  • I presume each of your “folders” is what Duplicacy calls a repository.
  • Each of your 10 repositories is backed up to a different snapshot id (aka backup id) in your local NAS storage.
  • You want to maintain a copy of your local storage in a storage on Google.

For maximum flexibility, stick with separate repositories/snaphot ids, because duplicacy commands operate on either individual snapshot ids or all snapshot ids in a storage. If you combine all your folders into a single repository/snapshot id, and you later want to delete one of those folders, or apply different prune retention policies to various folders, or omit some folders from your Google storage to save space, you’ll be out of luck.

If you’re using the same prune retention policy for all folders and you want to copy all of them to your Google cloud storage, you can:

  1. Perform 10 individual backups to local-storage
  2. prune -storage local-storage -all -keep your retention policy …
  3. prune -storage cloud-storage -all -keep your retention policy …
  4. copy -from local-storage -to cloud-storage
  5. check local-storage -chunks
  6. check cloud-storage

The two check commands can be run in parallel, and -chunks is optional.

You probably don’t need to prune every day, so that means setting up two schedules: e.g. a schedule without the prune steps to run MTWTFS, and a schedule with the prune steps to run on Sunday.

Yes, it’s a lot of jobs to set up in the Web UI. I also have 10+ repositories (e.g. one per home share, a legal-financial share, a multimedia share, and a repository for system images on each PC). These have different retention policies (e.g.-keep 30:365 -keep 7:90 -keep 1:7 for home and legal-financial, -keep 0:365 -keep 7:90 -keep 1:7 for multimedia, -keep 0:365 for system images), and I don’t want to copy system images to cloud storage. The result is many jobs in the Web UI.

Also note that storages can be initialized with different -encrypt and -erasure-coding options and still be copy compatible. E.g. my cloud storage has -encrypt, I have a normally disconnected cold storage copy on a USB drive with -erasure-coding, and my local storage has neither option.

I’m facing the same question:

I got two remote locations where I want to make backups, no local one.

So I’m currently running one job with two parallel backups to the remote location, which is kind of silly IMHO since duplicacy needs to calculate the delta twice, GPG-encode the data twice etc.

Isn’t there a way to select more than one (compatible) location as destination for a backup, without having to send it first to one location, download it again and send it to the other one?

I can’t imagine there would be, but is there a resiliency advantage to having one repository back up files with unique jobs to non copy-compatible storages? Or would it be better to just use copy-compatible storages?

Right, I think even if you’re going to use multiple independent backup jobs you should always make storages copy-compatible, so that when there is a corrupt chunk you might be able to just recreate it by copying from another storage.

2 Likes

Great point. That’s a perfect reason to use copy-compatible storage.

I hadn’t realised this thread was still going… but it’s great it is!

I have a stupid question to add to the mix… is there a way to make storage copy compatible after you’ve already backed up to it?

if not… seeing as i have a local and a cloud… i guess the solution is to delete my local storage, and then recreate it as copy compatible with my cloud storage…(and obviously start backing up to it from scratch)

No, only at storage creation.

There is another option, if you have enough space: create a new local storage that is copy compatible with your cloud storage and copy from your current local storage to this new one, and only then delete the old local one. It will be much faster.

But this would require me to upload the backup first to one storage and download it again from it and send it to the next one via the “duplicacy copy” operation, right?

Would be nice if we could make a backup to multiple locations at the same time.

You can if they have different backup job names. You could have Job1-Storage1 and Job1-Storage2

I used to do that, but just settled on backing up to storage1 (local) and setting up scheduled copy jobs to storage2 (remote).

1 Like

This will only be the case if storage is created as bit-identical. Copy compatible is not the same (in fact, I’m not sure if that is meaningful at all – what makes two random storages not copy compatible, given that durign copy unpacking and repacking needs to be done)

2 Likes

Yup, I need to edit that post.

I was thinking from a copy-operation within Duplicacy perspective. The files pack the same.

Not a file system copy operation perspective.

I mean ideally I just want to send it out once and have a CLI job on my SFTP storage to copy the backups to the next location. But this is currently not possible without storing the GPG-Key plus Password on the SFTP storage – which is a bummer.

So if I want to be able to recover from a partial file loss on two storages, I need to create them bit-identical?

I guess I could use some other sync script in this case to push the changes to the second storage location from my SFTP server… which is Google Drive btw.

I think this is a dead-end approach: if the chosen storage allows corruption to occur it’s a game over right there; trying to built crutches to mitigate data loss on a bad storage to various degrees is counterproductive.

Instead, use storage that guarantees data integrity.

Google drive won’t corrupt data. Your SFTP server — depends on what hardware is storage kept on. For example, hard drive with ext4 — don’t even bother, scrap it today. Array with zfs or btrfs with checksumming and monthly scrub — totally different story, integrity is_guaranteed_ by this design.

Therefore the purpose of second backup should be not to mitigate deficiency on a specified backup solution but for disaster recovery and failover — when one solution is not accessible you can use another. Not to try to improve reliability of flaky storage.

2 Likes