New User, migrating 4TB from Duplicati to Duplicacy - help and advice welcome

herbert · 10 March 2022 08:10

Hi,

after spending the last weeks reading about alternatives to my longtime Duplicati backup system I decided on migrating to Duplicacy.
I had a look at a few other solutions

borgbackup - only supports SSH backends
restic - I like it but there is / was now webui and autorestic did not work as a good CLI alternative to GUI
relica - which is a fork of restic with a very nice GUI, but the author wants to sell it, which does not sound quite promissing
kopia - very promissing, but the WebUI just “sucks”, could get better but who knows when

So I am here now and after testing everything I will use and pay for the GUI version. I like the idea to have the opensource CLI version always on hands to restore or even back up

Anyhow, I am moving away from Duplicati because of the same issue I had already multiple times. This lead me to rethink my backup jobs a few times already. I am running Duplicati for almost 5 years now, quite a long time, at the beginning I just it simply for PC backups.
The problem was always the same, for whatever reason, mostly fault connection, reboot, or service crash, the backup was stopped during backing up. This breaks the database, sometimes repairing is possible but sometimes it simply does not work. The same problem I face quite often, now again with a 2TB backup set (out of my 4TB) and restoring the DB from the chunks on the destination takes forever. 2TB with bi-daily backups for more then a year, the DB is around 7GB and the Internet connection not the fastest.

Anyhow, are these things avoidable with Duplicacy. That if I or something else stops the backup, that I can break it. I read a lot in the forum the last days, there are not many threads (compared to Duplicati at least) about broken backups. I also found one How-To how to fix a broken backup by redoing the backup with a different ID so the chunk is back or deleting chunks manually. This sound quite promissing, or at least better then what I experienced with Duplicati.

Secondly, I have split up my backup sets into multiple parts, for some reasons

I backup my nextcloud and because of multiple users I try to isolate the backup of my friends from the backup of my familiy members.
Because if a smaller backup set breaks it is not breaking the others (Duplicati style)
I backup data to a friends home and to B2
I can be more flexible in timing, because Duplicati has no parallel backup possibility (which I have with Duplicacy)

Anyhow, we are talking about 15 different users which leads to 30 backups when using Duplicacy. Every user gets backed up to a SFTP destination and to a B2 destination.
This is going to create a loooooong list of backups, is this any problem?

Anyhow this is some background, I would be happy about any advice to be honest. How to organizie the backups, the prunes, the checks etc etc.

gchen · 10 March 2022 19:07

I would suggest backing up to one storage and coping to another: Back up to multiple storages

The main advantage is that when a chunk gets corrupt in one storage for whatever reason, you can just copy over the same chunk from the other storage.

herbert · 11 March 2022 09:00

I thought about this too.
But if I have understood all of this right - and only if

The data of my nextcloud is located at my home location - this is also the data I back up and where duplicacy is running.

Backup #1
I backup to backblaze using the Internet at my home location which has good bandwidth.
I do only pay for the data stored on backblaze and the transaction created by the backup tool, as well as for necessary egress data from backblaze (checks and restores) which is within the free limits they offer for my daily backups.

Backup #2
This is from my home location to a friends house (using SFTP), who does not have a very fast internet connection, the init backup will take some time, but all good. The incremental backups are small and therefor this works just fine (and worked for years now with duplicati as the backup tool).

If I would do a copy, would the copy task copy the data from friends house to backblaze or backblaze to the friends house ?
Situation #1
Backblaze as the source would simple getting expensive because of the egress traffic happening to my friends house.

Situation #2
Where data is “copied” from my friends place to backblaze. This would take forever because the upload on my friends place is <= 1MBit whereas the download is much higher (this allows me to push my backup to him sufficient).

So copy seems not a good solution for my situation or did I get it wrong and copy actually copies the data from my home location (cache) to backblaze and my friends place.

Perhaps this would be a feature request - having a backup job running on my home location but defining two / multiple destinations. So every chunk / snapshot etc will be copied to both locations more or less simultaneously (depending on the internet speed of the destination a cache needs to be used).

Thanks

150041aae8ec90bee227 · 11 March 2022 11:18

You can do this now by running two scheduled backup jobs in parallel, i.e., each backup will use different storage. Personally, however, I’d ditch using your friend’s storage location, since time to recover is important with data loss, especially when recovering a handful of files. So, use a local backup for those times you need to recover files and directories, and cloud storage as your safety net. It’s about balancing risk and probability.

herbert · 11 March 2022 11:49

I do run 2 backup jobs for the 2 locations in parallel - this is not the problem, but it would not be the “copy” @gchen was mentioning.

I thought about ditching the backup at my friends place, already. But still keeping it there for two reasons, one is, if I would need to restore something large I could theoretically grab the harddisks there quite easily. Secondly, if something happens to me, my friend could simply take the disks and could restore the data from there without needing any other access. The password for the backups are stored already at his place in his safe.

But yes perhaps this should be reconsidered anyhow, the backup station at my friends place was the first offsite backup I had, years ago when moving my families and friends life into nextcloud. Backblaze got added less then a year ago as another offsite backup solution. So perhaps you are right I could move that setup to my home location, then a copy would be possible.

Anyhow, the option of uploading a chunk to multiple places at once would be a great addition to the existing options.

150041aae8ec90bee227 · 11 March 2022 13:47

When creating the second backup check (tick) the copy option. I also suggest you create sone local storage, backups and schedules to familiarise yourself with Duplicacy.

Droolio · 11 March 2022 17:33

Ideally, your first backup ‘copy’ would be local, so that you can subsequently copy (perhaps at a different time, and maybe less frequently) to a remote destination and have identical snapshot-in-time revisions in multiple places.

Alternatively, running a copy job directly on your friend’s storage would be the next best, but that would expose B2 keys and whatnot, so not ideal.

However, you can still sort of accomplish this with 2 remote destinations, while reducing bandwidth use - so long as both storages are copy-compatible between each other. Copy-compatibility basically ensures chunks can be deterministic - even when running separate backups, with separate job IDs etc… If the source data is identical (duplicated), it should result in the same chunk data (albeit re-encrypted differently between two storages, unless using -bit-identical flag).

So you could continue to backup to each storage destination, separately - using different IDs - and then if you wanted identical snapshot revisions on both ends, copy all the snapshots between storages, to each other, as a final step. i.e. Copy storage A (friend) > storage B (B2) and then B to A. You’d have two sets of IDs for each repository, but the chunk data would exist only once. This process should result in very minimal egress and is basically only copying metadata at this point, because your earlier backups already pre-populated the chunk data, which is duplicated between them.

In a sense, the deterministic nature of chunk data is already a cache (since it can be regenerated from the source data), but without occupying any disk space locally.

If your 2 storages aren’t already compatible, you’d have to recreate one of them from scratch. I guess you could speed this up by grabbing the HDD at your friend’s place, wiping it, then create a copy-compatible storage based on the B2 storage. Run a backup to pre-populate chunks and do a one time copy from B2 to grab metadata and previous revisions.

towerbr · 12 March 2022 12:37

A point to evaluate is the reliability of this backup at your friend’s house, considering the storage media. Unless it’s a high-end raid, this backup is as reliable as a $50 USB external hard drive in your house.

Years ago I considered my local physical backup as my primary backup, and the cloud backup as the “off-site copy”.

Today I consider my cloud backup as the main one, as there are huge companies with huge structures and thousands of engineers to guarantee the reliability of my data. And my local backup is just a copy with faster access to use in case of a full restore (because when I need just a few files I restore them from the cloud).

herbert · 14 March 2022 12:50

Wow this was a lot of input - amazing.
Thanks for all of the considerations and information.

I will move the backup setup from my friends place to my local place. This is for sure. There is no need to have two off-site backups, so the “old” offsite backup will get a local one for fast recovery if necessary and B2 stays in sync with the off-site backup.

How I do the sync is still not clear me, because there is one thing which bothers me. What if I make a copy local, and do a copy job to B2 (instead of a backup) and a chunk or anything breaks on the “local” backup. Wouldn’t this be synced to the B2 storage too, using the copy feature?
If so, I would have 2 broken backup destinations?!

The idea of having 2 separate backups, which backup (instead of copy) to local and B2 sounds more interesting. If both of them are copy compatible, if I understood right, one could “copy” the data to the other if broken?! Is that right.
So I do 2 backup jobs, if one backup breaks for whatever reason, I copy from the good to the broken one which should fix the “data” because they are copy compatible?
Is that right ?

THANKS

Droolio · 15 March 2022 02:42

This cannot happen. In fact, it’s one of the main benefits of doing a copy with Duplicacy itself - the process of copying will cause Duplicacy to download chunks from the source, unencrypt then re-encrypt to the destination. Any bad chunks will get caught as an error and stop.

In effect, this validates the source while making a secondary copy elsewhere.

herbert · 15 March 2022 10:28

This information I have not found in the docs, which does not mean it is not there just that I have not read it for whatever reason.
BUT this makes the whole copy feature really awesome and as soon as my hard-disks are at my location I will change my backup strategy accordingly.

GREAT - thanks