Backup HDD rotation: backup, copy

tsb · 13 August 2024 12:24

Hello everyone.

I’ve read through many forum posts, but unfortunately, some of them are quite old, and I’m not sure if they’re still relevant for the current CLI version. As a result, I’m now very confused and would appreciate some help.

I want to set up a backup with HDD rotation. I have three HDDs available. My planned process is as follows, with the HDDs (A, B, C) being swapped out once a day:

Source → backup → A
A → copy → B
Source → backup → B
B → copy → C
Source → backup → C
C → copy → A
and then the cycle starts over…

So far, so good. There shouldn’t be any issues here. I should have the same revision history on each HDD, right?

However, what I don’t understand anymore, after reading many forum posts, is what happens if the following theoretical case occurs:
Source → backup → A
Source → backup → B
A → copy → B

In this case, if no backup has been created yet, both A and B would receive revision 1 during the backup. What happens during the copy process? Does the older backup on A then receive revision 1, and the newer backup on B revision 2? If I understood some forum posts correctly, this might work if different repository IDs or backup IDs are used? But I’m not sure what is meant by repository ID or backup ID. In the documentation, I only find storage name and snapshot ID (https://forum.duplicacy.com/t/copy-and-backup-to-same-storage/3733).

What also isn’t clear to me is whether the copy task is more resource-efficient than simply using separate backup tasks? And what happens if there are corrupted chunks on an HDD? Are they copied during the copy process, or are they checked for corruption during the copy process?

Thanks in advance for your help.

saspus · 13 August 2024 15:37

You would create different snapshot IDs for each destination, so when you copy backups between targets, separate backup histories are maintained.

I however strongly recommend against juggling disks like this. Instead, create two separate backups to two separate cloud destinations.

tsb · 13 August 2024 19:52

Thank you for your response.

You would create different snapshot IDs for each destination, so when you copy backups between targets, separate backup histories are maintained.

However, if all HDDs have their own revision history, what is the advantage of the copy command compared to the backup command only?

I however strongly recommend against juggling disks like this. Instead, create two separate backups to two separate cloud destinations.

Could you explain this in more detail or refer me to relevant information? I’m seriously interested. I especially see cost advantages with rotating HDDs compared to the cloud. Yes, it’s less convenient. But as long as one hard drive is always off-site, one is always connected to the system, and another is securely stored and separated from the system, I don’t see any significant disadvantage. I would also regularly check file integrity with cloud providers, as I’ve had too many bad experiences in the past with different cloud provider.

Droolio · 13 August 2024 20:09

If you can stick to your first scenario (ensuring a copy is done before a backup on a newly cycled drive), then you should have no issues.

If you accidentally or intentionally get scenario 2, nothing too bad will happen but you’ll not benefit from a consistent revision history that a pure copy provides*. You’ll have revision numbers which don’t match between storages, which isn’t ideal but not catastrophic (not much different to running backups independently with the same IDs). You might also get a different mix of backup and copied IDs which means incremental chunking isn’t as efficient. Consistent backups are especially important when one of your storages fails and you might have to ‘fix’ corrupt/missing chunks from a second storage.

A copy isn’t necessary more resource-efficient, as it needs to decrypt the source before repacking it on the destination, but it does mean if there are corrupted chunks, it’ll error out and you won’t end up with a corrupt destination copy. It also allows transferring data without access to the original source, so for example you could run the copy process on a storage server.

* You can normally fix this at any time by deleting the entire snapshot\ directory (or just known mismatched IDs) and re-running a copy. The chunks won’t need to be recopied and you’ll end up with a consistent snapshot state between the two storages.

tsb · 13 August 2024 20:40

Thank you very much.

If you accidentally or intentionally get scenario 2, nothing too bad will happen but you’ll not benefit from a consistent revision history that a pure copy provides*.
*You can normally fix this at any time by deleting the entire snapshot\ directory (or just known mismatched IDs) and re-running a copy. The chunks won’t need to be recopied and you’ll end up with a consistent snapshot state between the two storages.

That was the important information I was missing. Thank you!

A copy isn’t necessary more resource-efficient, as it needs to decrypt the source before repacking it on the destination, but it does mean if there are corrupted chunks, it’ll error out and you won’t end up with a corrupt destination copy. It also allows transferring data without access to the original source, so for example you could run the copy process on a storage server.

Got it! - I was a bit confused as you were talking here about less resources.

Two more questions (sorry!):

If corrupt chunks are detected during encryption/decryption, do I have to perform a check on the HDD A that is copied to B beforehand? In other words, are the snapshots also checked during the copy process or are only chunks checked?
If I make HDD B copy compatible to HDD A and make HDD C also copy compatible to HDD A, is HDD B also copy compatible to C?

So my implementation would be as follows:
init HDD A
add HDD B copy compatible to HDD A
add HDD C copy compatible to HDD A
I am not yet clear: Do I now set (init, add) the same snapshot id for all of them? I think so, to get the same revision history on every HDD.

Droolio · 13 August 2024 21:09

In a sense, it does use slightly less resources (doesn’t need to index, shadow copy, and chunk data) - but that comment was specifically in reference to a NAS or server, which doesn’t need to use any resources on the source system (if you get it to do the copying), but some extra CPU (decrypting and encrypting chunks).

Both the copy and check -chunks should tell you if any chunks are corrupted. No need to run a prior check although a regular check on all storages is always a good idea.

Yes. You can also have different Erasure Coding and compression (l4/zstd) parameters between copy-compatible storages.

That’s strongly advisable yea, if you wanna stick to your first scenario of backup, copy, cycle.

tsb · 14 August 2024 12:24

Thank you so much! - Really helps.

system · 24 August 2024 12:24

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.