Azure Storage - When Network Errors Occur, Process Corruption Occurs

Please describe what you are doing to trigger the bug:

This bug is readily reproducible, although it confused me at first. The observation I am having is that when I am restoring my large backup (80-100 TiB), if there is ANY network issue (e.g. read timed out) the restore process becomes corrupted. This corruption manifests in subsequent chunks starting to error out, until the process actually crashes after ~1 hour.

Please describe what you expect to happen (but doesn’t):

When a network error occurs, my expectation that any interrupted / in-process work is purged, and restarted without error. Ideally with some kind of exponential backoff – but at a minimum, the process shouldn’t be corrupted.

Please describe what actually happens (the wrong behaviour):

This will manifest in messages like “Failed to decrypt the chunk dcdf824930f8debed7e87dc828186eec1b789458e0c32006720692f0dfb9c02a: cipher: message authentication failed; retrying” or “Failed to decrypt the chunk ee94639e9d7048ef9d09c3ad5d440710f1bd4482132acfc317ea983ec7096d78: The storage doesn’t seem to be encrypted”

This issue is resolved by killing the process and restarting when any network issue occurs.

Here is an extended log as an example: Log Example - Duplicacy · GitHub

Version: 3.2.3 (254953)

How big is revision 119? If you have enough local storage, you can run duplicacy copy to copy all chunks that belong to this revision to a local storage and then run a restore against this local storage. This should be much faster that a direct restore.

@gchen Not exactly positive, but this is almost a full restore, so I would estimate ~80TiB needs to be downloaded (gigabit fiber in place). The ZFS array has 323T of space, so a copy is definitely an option. I was seeing ~20-30 MB/s (significantly below gigabit) on restore.

Any advice on parameters I should set or commands I should run to plan this?

Specifically – my azure storage is PW protected and has a key. Should I create a bit-identical copy (with the same key) or just have a non-encrypted local copy. (Is one approach better? – If it’s possible to use the rsync tools to copy blobs that will probably be the best…)

I think -bit-identical should be used. With this option you can use rsync/rclone to copy chunks, but you need to figure out which chunks to copy. duplicacy copy is so much simpler and support multithreaded downloading.

Here are the commands to run:

mkdir -p \path\to\local\storage
mkdir -p \path\to\restore\directory
cd \path\to\restore\directory
duplicacy init backup_id azure://storage
duplicacy add -copy default local -bit-identical backup_id \path\to\local\storage
duplicacy copy -from default -to local -r 119 -download-threads 40
duplicacy restore -r 119 -storage local

I did a bit of testing and found that I needed to add -threads 16 to get the into the 900 mbps territory. Based on current speed it looks like I will be able to test a restore in a little over a week.

I will follow up!

Incidentally, if you have a local copy of much of the source files (partial or old), you could ‘pre-seed’ a local backup storage, that you can ‘top up’ from the remote storage.

All you’d have to do is make a copy-compatible storage - i.e. create a new (empty) local storage, using the remote storage as a copy-compatible template. Then backup those source files to that local storage using temporary IDs; this all happens locally. As a result, you’d have a bunch of content-defined chunks which a final copy from the remote can fill in, skipping chunks you already have.

Obviously, this requires you to have at least some of the original files to make this worthwhile - depending on how much you have, could save quite a bit of time.