How to backup from a backup

ustulation · 14 July 2019 15:57

I used to backup my machine to two back ends - say storage-0 and storage-1. All encrypted. storage-0 and 1 basically resided on different (local) machines - M-0 and M-1, if it matters. Now at M-0 i had also added an online storage/back-end, say storage-2, as a copy of storage-0:

duplicacy init -e storage-name=storage-0 some-id /path/to/storage-0
duplicacy add -e copy=storage-0 storage-2 some-id <online://bucket>

So my process used to be:

From the primary machine where the data is, often run:

duplicacy backup -storage storage-0
duplicacy backup -storage storage-1

Then only occasionally run the following from M-0

duplicacy copy -id=some-id -r=<latest> -from=storage-0 -to=storage-2

Now the problem is that M-0 is dead/down and no longer accessible. M-1 with storage-1 is ofc fine. So I turned to M-1 and tried to set up similar copy to existing storage-2 but it says that the Two storages are not compatible for the copy operation.

I ofc don’t want to delete storage-2 again and make a new one that is compatible with storage-1 as i’ll have to now upload huge amount of data and then if storage-1 is dead later, I’ll be in the same predicament.

What is the best way forward given this scenario ? Even if I cannot somehow make storage-1 and 2 copy compatible, is there a normal backup from storage-1 to 2 that i can run which will do the expected incremental backup (so only the new/affected chunks since the last revision in storage-2) ? Can you please specify the command just for sake of clarity ? (I don’t want to mess anything up).

TheBestPessimist · 15 July 2019 05:13

Since the 2 storages (st1 and st2) are not copy-compatible, I think there’s no way to achieve what you want without creating a new storage.

I don’t think a new storage will help fixing everything though.
You have st0 <-> st2 (copy-compatible, so can be treated as a single one for this explanation) , but st1 is not copy-compatible with any of those.

If you create a new storage, it can be copy-compatible with only 1 of the storages (at least this is how i understand it), so the other one will be left out. In other words st3 will be copy-compatible either with st1 or st2 but not both.

Maybe anyone else has some better ideas?

towerbr · 15 July 2019 12:41

duplicacy init -e storage-name=storage-0 some-id /path/to/storage-0
duplicacy add -e copy=storage-0 storage-2 some-id <online://bucket>

The only information I didn’t see (or didn’t understand) is: how was sto1 created? A new init or add -copy from sto0?

If it was initialized separately, then the config files were encrypted with different keys and are incompatible, by my understanding.

Droolio · 15 July 2019 14:27

Seems your goal is to recreate one of the storages (M-1 preferably - since it isn’t copy-compatible with the others) but to ‘seed’ that storage from the old one, locally, in order to reduce bandwidth…

You could do this if you had remote/direct access to that machine, and had enough storage on it to 1) do a full local restore to a temporary location, and 2) backup that temporary location to a newly-initiated, copy-compatible storage (with M-2) - again, all done locally.

The storage on M-1 would be recreated and subsequent backups would de-duplicate most of the chunks. You’d only lose out on old snapshots (unless you went through the trouble to pre-seed the new storage with different revisions - I guess you could script this, but in order to be useful you’d have to manipulate the system clock to put it back in time).

Additionally, but most importantly, you could then copy snapshots from the new storage M-1 to M-2.

Technically, if you don’t want to keep old snapshots, you could just completely scratch the storage from M-1 and initialise a new M-1 from M-2. (Or keep the old storage around if you have the disk space.) Then backup your repositories to M-1. Being copy-compatible with M-2, most of the chunks will be skipped when you copy from M-1. This approach maybe preferable if the repositories to be backed up are on the same local network as M-1. Otherwise, to save bandwidth, recreating the storage locally on M-1 maybe your only option.

ustulation · 26 July 2019 09:24

Ah cheers everyone ! Sorry had limited connectivity on a vacation and couldn’t come back to this earlier.

New inits I’m afraid, not copy compatible.

OK, I wish I was more careful earlier. My goal was to store on 3 different machines M0, M1 and the one which is online. I thought of optimising the online storage by doing a copy from M0, in the hope that it

speeds things up a bit
keeps the revision numbers in sync - so r5 in M0 means the same as r5 in the online storage storage-2
if something happened to M0 simply make M1 copy to storage-2.

Point 3 above was my gaffe.

So currently as it stands, M0 is dead - can’t be accessed (badly broken HDD, so will need quite a bit of trouble to recover if at all possible).

I’m thinking of not going the copy route again as it doesn’t seem to do what I thought. I need to continue with backups even if one of the storage failed, and for this, copy puts some restrictions.

So the question now is,
Even though the online storage-2 was copy compatible with the now gone storage-0 can I still:

backup my primary machine where data is, directly to the online storage-2 and expect all the chunks apart from the new ones to be de-duplicated or did copy somehow do yet something else to blow even that out of the water ?
Can other independent locations like storage-1 be restored and also backed up to online storage-2 and expect the same as (1) above (i.e., mostly everything to be deduplicated) ?

Basically it all boils down to: If there’s a location (storage-2 here) which was copy-compatible with some other backup now gone, can that location (storage-2) be treated as some place which can still be used for backing up the primary location directly AND/OR be used to backup the other independent backups of primary data (like storage-1) once they are restored somewhere temporarily ofc ? Will this deduplicate all but the newer stuff or would most of the chunks (from primary data storage or restore of storage-1) be re-created ?

Droolio · 26 July 2019 13:22

Copying between storages is still very worthwhile - as long as you make the storages copy-compatible…

Your only mistake was when creating the storage on M-1, you didn’t make it copy-compatible with either M-0 or M-2 - you only need to initialise (via add -e -copy) from one of the storages in the network of storages, so to speak, to make a new storage compatible with all the others. (M-0 and M-2 being compatible with each other; if you create a new storage from either one, it makes that new storage compatible with both.)

To answer the latter part of your post, de-duplication and incremental backups will work perfectly fine if you run independent backups to each storage.

And if you restore from any backup storage into a new repository and backup to either storage, it will de-duplicate and skip chunks that already exist - if they data itself already exists in that storage, despite it being encrypted/chunked differently.

The only thing you won’t be able to do is copy snapshots directly between incompatible storages, that’s all. (Unless, of course, you go through the efforts of nuking one of the storages and create them with compatibility.)

ustulation · 26 July 2019 13:50

ah that clarifies it - cheers.

Do you mean something like this, to achieve what I wanted (infrequent updates of the online but regular updates of the in-hand devices):

from primary create storage-0 in machine M0
goto M0 and create storage-1 in machine M1 making it copy compatible with storage-0
also from M0 create storage-online online and make that too copy compatible with storage-0
Now back at the primary machine where data is add storage-1 too which is already created by now in M1.
From primary often fire backups to both storage-0 and storage-1.
From storage-0 occasionally copy to storage-online
storage-0/M0 dies beyond repair.
Make another storage storage-2 in M2 from storage-1 in M1 and make it copy compat.
Back at the primary, remove the dead storage-0 and add an already existing storage-2 created above.
Regularly fire backups to storage-1 and storage-2
occasionally goto any storage-1 or storage-2 and copy to storage-online.

and so on as local storage/machines keep dying.

Is this algo correct ?

ustulation · 26 July 2019 13:52

Also is there a way I can do both 1 and 2 from the primary without having to go to machine M0 ? i.e. add two backups storage-0 and storage-1 such that both of them are copy compatible with each other ?

Droolio · 26 July 2019 15:00

The problem you may experience with your new strategy is step 5.

This almost defeats the purpose of keeping copy-compatible storages if you run multiple jobs to different storages.

You can do that - Duplicacy supports it - but if/when you want to copy snapshots between storage-0 and storage-1, you’ll likely run into a mess - even though the chunks may be skipped and de-duplicated. Each storage will have a different revision 100s, say. Actually I’m not sure if you can overwrite snapshots, but either way, those revisions will be out of sync, and you wouldn’t end up with a clean history.

The best strategy imo is to backup once and let the storages copy snapshots between each other. i.e. primary > 0 > 1 > 2.

Regardless, the important part in your procedure is how you create new storages. IF you want to keep them copy-compatible, you need to use add -e -copy instead of init. The latter should only be used once, to create the initial storage, or to connect up an existing storage to a new repository.

To answer your last question… yes, if primary has access to both underlying storages via a URL (including SMB), it can initialise (either init or add -e -copy) a new storage remotely to that destination.