Backup multiple locations - how would you do?

wohnjick · 29 August 2023 13:54

Hi,
I’m struggling now for a few days to find a solid solution for my setup.
I want to backup multiple locations (workstation files, server files, device configuration files, images/videos, …) with the 3-2-1 method. On my nas, an attached hdd and a b2 storage.

I don’t have access to all 3 from every system so my plan was to Backup everything to nas and copy from them to the other storages but seems that I only can copy from within an repository. So should I just rsync the local storage?

Also I sometimes have not a single file but a file with the date in name (backup_2023-08.tgz for example).
How would you handle that? Rename them first or just trust the deduplication?

Thx

Droolio · 30 August 2023 13:14

Not sure what you mean by this. Duplicacy has a copy job (under schedule in the Web UI) that can copy all or individual snapshot IDs from one storage to another. You just have to add a second storage.

Or, if you prefer, your NAS to take care of the copying (this is how I do it too) - you can install the Web UI on the NAS and set up the secondary B2 storage there (B), add the local storage on the NAS (A) and do a copy (A => B).

This won’t de-duplicate well, and file naming is mostly irrelevant as Duplicacy separates the metadata from file content. The problem with .tgz are they’re compressed, and the byte stream is practically randomised when you make very small changes to what’s being compressed.

If you really care about deduplicating that type of data, it’s best if it’s left uncompressed, so a .tar will de-duplicate fairly well (though the original unpacked data even better). Perhaps you can exclude the compressed version and just backup the source from whence it came?

wohnjick · 30 August 2023 17:06

Hi Droolio,

sorry forgot to mention that I’m using the cli version.
What I mean is, that I cannot copy from anywhere, I have to enter the repo with .duplicacy directory.
Otherwise I got the error “Repository has not been initialized”. So maybe I understand it wrong but if I use duplicacy to backup 3 Repos to one default Storage I have to execute copy from every Repo.

I have a Backup directory on my Nas /Backup, I have an HDD attached to that Nas /HDD and I have the B2 storage.
Now I have one PC, 2 Notebooks, Mobile devices, config files that I backup from time to time by hand and some files from the Nas itself.

Everything should at first be backuped to /Backup. But if I use duplicacy on my PC to backup to /Backup I’m unable to copy it to /HDD from the nas. Also I can’t use duplicacy for every file, I have to backup plain Files to in the first step. So only chance I see is to send the files with an other tool, rclone for example, to /Backup and then use duplicacy for /HDD and B2. Right?

Maybe you can explain your setup a little bit more in detail.

Droolio · 30 August 2023 19:20

The principle is the same with the CLI.

The copy command works on the storage (referenced by the configuration for any given .duplicacy/preferences file, rather than the repository (unless you specify -id; otherwise it assumes all repository IDs in the storage will be copied). Also, if you don’t specify -storage, it uses the first entry in the preferences if you have multiple.

Presumably you have all 3 repos pointing to the same Duplicacy backup storage?

In which case, you can use any one of these repos (cd'ing into it) to do the copy - perhaps the same one as where you run any prune or check jobs - it doesn’t matter (although it makes sense to pick 1 and stick with it coz your cache will fill up with all those maintenance operations).

You don’t quite mention why you’re unable to copy it?

But Duplicacy is flexible enough that you can copy between multiple storages - so if you have a <NAS>/Backup, <NAS>/HDD, and B2 storages, you can copy between them - so long as you initialised the second, third etc., using the ‘copy-compatible’ option using the add -copy command. You don’t have to involve rsync at all.

You can add the configuration for all storages to a ‘dummy’ repository on the NAS (either by way of init / add commands, or directly to a .duplicacy/preferences), just for the purposes of maintenance jobs, such as pruning, checking, and copying. e.g.:

[
    {
        "name": "local",
        "id": "local-dummy",
        "repository": "",
        "storage": "/backup",
        "encrypted": true,
        "no_backup": true,
        "no_restore": true,
        "no_save_password": false,
        "nobackup_file": "",
        "filters": "",
        "keys": null,
        "exclude_by_attribute": false
    },
    {
        "name": "hdd",
        "id": "hdd-dummy",
        "repository": "",
        "storage": "/mnt/hdd",
        "encrypted": true,
        "no_backup": true,
        "no_restore": true,
        "no_save_password": false,
        "nobackup_file": "",
        "filters": "",
        "keys": null,
        "exclude_by_attribute": false
    },
    {
        "name": "b2",
        "id": "b2-dummy",
        "repository": "",
        "storage": "b2://bucket",
        "encrypted": true,
        "no_backup": true,
        "no_restore": true,
        "no_save_password": false,
        "nobackup_file": "",
        "filters": "",
        "keys": null,
        "exclude_by_attribute": false
    },
]

Then script stuff like:

duplicacy -v -log prune -storage local -keep 30:365 -keep 7:90 -keep 1:14 -a
duplicacy -v -log check -storage local -a

duplicacy -v -log prune -storage hdd -keep 30:365 -keep 7:90 -keep 1:14 -a
duplicacy -v -log check -storage hdd -a

duplicacy -v -log prune -storage b2 -keep 30:365 -keep 7:90 -keep 1:14 -a
duplicacy -v -log check -storage b2 -a

duplicacy -v -log copy -from local -to hdd
duplicacy -v -log copy -from local -to b2

HTH?

wohnjick · 31 August 2023 12:54

This is so much Brain F*CK

Yeah that’s what I mean. I tried your idea with dummy repo and that works. Don’t know what I did wrong in my tests before because I wasn’t able to copy all Snapshots. Nevermind, the dummy repo thing is a game changer.

Typo, “backup” not “copy”. So the idea was to use duplicacy on my PC to backup to <NAS>/Backup but then, creating a backup (not copy) of this directory again sounded like a bad idea. But this is solved now with the dummy repo thing.

And for my “other” files like configs I will create an additional backup directory. Files aren’t big add all and then I’m able to unpack them at first.

So yeah that’s helped alot. Thx man!

JarnoP · 1 September 2023 13:15

Hello ,

IF I understood you correctly, you want to backup from several “places” (i.e. “repositories” in duplicacy terms into one “storage”). I can understand your confusion, I keep mixing “repository” and “storage” since those terms are so close to each other.

You need to init every “repository” (i.e. source) separately, not “copy” those. Duplicacy does not work like “backup disk” where you send whatever files you want to backup from different places. Instead, you choose with directories you want to backup and init those with the initial “storage” (backup destination). You separate different repositories in the same storage from each with a “snapshot id” (this could also be called as “repository id”). I backup several different hosts with party different directories/files to one storage using different “snapshot id” for each. duplicacy does de-duplication to find common blocks in different files and saves disk space by saving it only once.

What comes to “syncing” files between computers/NAS. rsync is one option. I have used Syncthing for years and it works great. Kind of self-hosted dropbox without a central server.

Did I understand your question and did my reply help at all?

Droolio · 1 September 2023 15:26

Just to clear up any possible confusion from this statement - init has two purposes:

establish a configuration (in the .duplicacy/preferences file) of a relationship between the local repository (source) and the storage (destination).
initialise the actual storage at the destination if it doesn’t already exist.

So if the storage already exists, all init is doing is writing a preferences file. While you can do it that way, you can also just edit it yourself like I suggested in the above ‘dummy’ configuration.

Also, you can have multiple entries in the preferences pertaining to multiple storages - potentially unrelated to the particular repository - so you can have a bunch for maintenance operations (prune, check, copy etc.) in a dummy repository.

This is how the Web UI manages its own .duplicacy roots - i.e. 1, 2 for backups, all, restore etc… (Furthermore, the Web UI uses the -repository flag to point to an actual root of any backup repositories, you can create your own empty tree of these things and have the data elsewhere.)

This part is unnecessary if you use Duplicacy’s copy.

The main benefit using Duplicacy over a sync tool is you can copy a subset of IDs or revisions - for example, if you have a longer prune retention on one storage and not the other. So you can have older versions on a destination, or even on the source. With CLI scripting, you can even just copy the ‘last’ revision only, and prune both storage to different rules.

Another main benefit is you can have different compression ratios (with the new zstd compressor; not yet released, but hopefully soon) and Erasure Coding settings for each storage, and Duplicacy will repack data between them - all the while verifying the integrity of the data.

I use SyncThing too (for regular files, and getting my photos copied off my mobile) - it’s a fantastic tool - but honestly, I’d avoid using any such sync tools with doing backups.

The benefit of having a secondary or tertiary storage is that you’re able to recover if one gets corrupted. Synchronisation will only trash all storages simultaneously and you’ll be up bit creek without a toggle!

JarnoP · 1 September 2023 18:18

That’s why I have file system level scheduled snapshots configured at every host syncing the files.

It’s all about redundancy. Having independent systems creating backups so a failure in a singe system doesn’t cause data losses.

wohnjick · 6 September 2023 13:43

Yeah from different places but I was fine with the wording. My problem was more practically not technically But I think I’m on a good way now.