Resuming unfinished same data backup from another machine?

datao · 16 January 2025 09:42

I tried searching the forums a bit for this, but can someone clarify:

With encrypted storage:

If I started a backup with ID “OriginalBackup” on machine1 but I only finished about 80% of the full backup upload, stopped it & shut down this machine.

Resumed backup of the same data on machine2 to the same storage:

Is it important that I use the same ID “OriginalBackup”? Because I’ve never completed a full backup of that ID & I’d like to change it, but not re-upload 80% of the chunks.
Does it matter that the directory structure of the data is slightly different, even if the files are the same? I’m thinking no, because it’ll just use deduplication & that doesn’t care about directory structure.

So in short:

Can I resume backup on machine2 with new ID “NewBackup” towards the storage and it’ll just skip 80% of the chunks and continue?
Or do I need to stick to the original backup ID “OriginalBackup” finish the full backup and THEN I can adjust backup ID?

Droolio · 16 January 2025 14:04

Not important, you only have chunks so far. Change the ID to whatever you need long term.

For context, you can even change it at any point later - say, if you’re migrating the data to a fresh OS and need to delineate the move (perhaps coz the ID has the hostname in it). The old ID will get pruned out eventually over time (apart from the last revision).

The important point is not to use the same ID at the same time.

Doesn’t matter.

There might be a few extra chunks uploaded due to where the directory structure border against something different, but Duplicacy uses a rolling hash that effectively resets the chunk boundary every once in a while, so that the majority of chunks are deterministicly the same.

Yes.

datao · 16 January 2025 14:23

Hey thanks for your responses, that’s helpful.

This is probably up to personal preference, but I want to ask your opinion.

So if I’m backing up the data of two different machines (completely different data set) and let’s say the data belongs to two different individuals, would you send the backup data to single repository in gdrive or 2 unique?

Droolio · 16 January 2025 14:52

Depends on your security concerns.

If you’re the admin of both computers and there’s no risk of physical compromise, you can just lock down the web GUI (if you’re using that) with a password and evaluate the risks that someone may glean the storage keys off the workstations…

But, either user will be able to restore the others’ files, which can be mitigated if you use RSA encryption in addition to the normal encryption. This requires a private key be provided for restore, which can be kept off both computers until needed. (The public keys allows encryption but not decryption.)

datao · 16 January 2025 15:04

Okay, let’s say security is taken out of the equation:

The way I see it, main benefit with 1 repository is deduplication between the 2 users if they happen to have some identical files that exist in their data sets or will exist in the future.

Benefit of 2 repositories:
I don’t know? Maybe the following:

Ability to copy repository to other location/storage & only affect the files of the 1 user. The data is kept completely seperated.
Indexing perhaps when doing restores of gdrives faster with smaller data sets, or doesn’t it matter since it’s tied to a snapshot ID?

Droolio · 16 January 2025 15:25

Deduplication is certainly beneficial even if the two sets of files is mostly different (only way to know to what extent is to try it).

Otherwise, I wouldn’t worry too much about the slight overhead when doing checks and prunes (remember, only one machine should do this) - with ~double the amount of chunks to maintain. The difference between 1 and 2 machines is negligible, so it may only matter at scale.

You can always start with 1 storage and separate them if the need arises. Or visa versa (make sure to make them copy-compatible to be able to do this in the future).

datao · 16 January 2025 16:58

Okay, thanks for info!

So if I had 1 storage and later wanted to separate them into two storages, would the process be:

Make copy-compatible storage2
Do copy on storage1 for the backup-ID of machine2 to storage2
Prune machine2 backup-ID out of storage1, by deleting the backup-ID snapshot folder?

Something like this?

If there are no drawbacks with having just a single storage repository I think I’ll do that.
Quick question, when doing a restore from a backup-ID, it only fetches the chunks that are associated with that backup-ID, right? So there wouldn’t be any performance differences from trying to restore from 1 big repository that contains all the backup-IDs vs. restoring from separate ones?

Droolio · 16 January 2025 17:18

Pretty much yep.

With step 3, it’s easier to just delete the \snapshots\id directory on the storage and then do a prune -exhaustive, since a normal prune can’t delete the last revisions (and you’d have to target that last revision and all other revision numbers with the -exclusive flag, which you always need to take care of using correctly).

Correct. A restore has basically no penalty since it’s not inventorying chunks, just accesses the index and referenced chunks directly.

datao · 16 January 2025 17:56

Cool. I’m a little unsure on step 2, from the GUI if I set a copy in schedule it’s for a whole storage to another storage. How would I filter out the snapshot ID and only copy chunks related to it from storage1 to storage2? Is this something that must be done in CLI?

datao · 16 January 2025 17:59

Nevermind, I read

I could add

-id machine2

In options in the GUI for the copy job

And then it would only copy chunks related to machine2 to storage2.

Droolio · 16 January 2025 18:14

Yep, that’ll do it.

datao · 16 January 2025 18:18

Just a quick question relating to this. For cleanup, should I simply delete the empty snapshots folder in the repository called ‘OriginalBackup’ to prevent it from showing up on restores? It correctly tells me that there’s no revision associated with this snapshot ID (because only 80% of the chunks completed uploading)

Droolio · 16 January 2025 18:35

Yep, safe to delete (presumably you have an empty /snapshot/id directory there, since it didn’t complete?

You may have a handful of unreferenced chunks relating to that initial backup, in which case you can prune -exhaustive after deleting the snapshot ID on the storage, but make sure you have at least one “NewBackup” snapshot before you do that.

datao · 16 January 2025 18:37

Yes it’s empty, so I will delete it.

I will let the backup complete with the new ID and then run the prune command as suggested.