Backblaze B2 restore missing chunks

eric1 · 1 December 2024 14:07

I made a successful backup to Backblaze B2, but when I try to restore from the backup, I got the following error.

2024-12-01 19:30:43.192 ERROR DOWNLOAD_CHUNK Chunk 9a27103a9d06971c38cbbd5348a0a1a81062ccaae16bf82dcd0a5181e1a5b2c4 can’t be found

Searching the forum and following the advice in this thread about fixing missing chunks I ran duplicacy_win_x64_3.2.3.exe check and got the following result:

Total chunk size is 583,073M in 123082 chunks
Chunk 9a27103a9d06971c38cbbd5348a0a1a81062ccaae16bf82dcd0a5181e1a5b2c4 referenced by snapshot data_id at revision 1 does not exist
Chunk c53219e181dcd33bc19dd1643343ee98a61fab9429231b9cdbc994bf57c90517 referenced by snapshot data_id at revision 1 does not exist
Some chunks referenced by snapshot data_id at revision 1 are missing

The advice in the thread says:

If you are uninterested in figuring out why the chunk went missing and just want to fix the issue, you can keep removing by hand the affected snapshot files under the snapshots folder in the storage, until the check -a command passes without reporting missing chunks. At this time, you should be able to run new backups.

so I deleted the snapshots folder on B2 and started another backup.

I have a two questions:

Why is this subsequent backup so slow? I only deleted the snapshots folder on B2, and left the 600GB+ chunks folder intact, so I assumed it would run very fast as it didn’t need to upload anything, but it’s almost as if it’s uploading everything to B2 again. The Duplicity GUI is showing that its backing up at around 2MB/s, but the size of my chunks folder on B2 is not changing. Hard to tell exactly what Duplicity is doing behind the scenes.
Why did Duplicity report a successful backup when in fact it did not backup successfully? This would’ve been a serious problem if I didn’t bother to try testing the restore and just assumed I could restore in the future when needed.
As far as what could be the cause of the missing chunks, the only thing I can think of is that my Internet connection died a couple times in the middle of the initial backup and I stopped and restarted the backup via the GUI. However, according to this thread stopping and restarting a backup is supported.

I’m currently using the Duplicacy trial if that matters, and evaluating if it fits my backup needs. Happy to pay for a license if the above issues can be resolved.

gchen · 1 December 2024 15:19

If you’re using a single upload thread then you should add the -threads 8 option to increase the number of upload threads. If you’re already running multiple upload threads then it might be something else. Run the benchmark command to determine the bottleneck.

I would be interested to know why those chunks are missing. Add -d as a global option to the backup job and it will print to the log file every chunk uploaded. If check reports a missing chunk again you can go back to the log file to find what happens with that chunk. In any case you should run a check after every backup which was designed exactly for this purpose.

eric1 · 1 December 2024 17:31

I was using 10 threads. Running duplicacy_win_x64_3.2.3.exe benchmark -storage Backblaze -upload-threads 8 -download-threads 8 results in the following output:

Generating 256.00M byte random data in memory
Writing random data to local disk
Wrote 256.00M bytes in 1.11s: 230.82M/s
Reading the random data from local disk
Read 256.00M bytes in 0.06s: 4115.01M/s
Split 256.00M bytes into 52 chunks without compression/encryption in 1.51s: 168.99M/s
Split 256.00M bytes into 52 chunks with compression but without encryption in 2.01s: 127.34M/s
Split 256.00M bytes into 52 chunks with compression and encryption in 2.15s: 119.10M/s
Generating 64 chunks
Uploaded 256.00M bytes in 50.74s: 5.04M/s
Downloaded 256.00M bytes in 22.90s: 11.18M/s

The upload/download speeds seem in line with my Internet connection speed.

I stopped and restarted the backup with the -d global option and I see a bunch of messages like this in the log:

DEBUG CHUNK_CACHE Skipped chunk 97b5606e42e98695de38b0a5cce9bc9575c7b9433077b8b8c721199d706c6c0a in cache

It seems like Duplicacy is skipping the upload of chunks that already exist on B2, but my question then is why is the backup operation taking so long if there’s nothing to upload? At this rate, it looks like it’s going to take a similiar amount of time as the initial backup, which was multiple days for 600GB.

I will run the check command after this backup finishes, but I’m wondering why isn’t check run during/after every backup by default? It’s a problem if a backup can complete successfully, but not actually be restorable.

saspus · 1 December 2024 19:46

Chunks don’t just disappear, and there is no reason to run check usually.

There is a known issue that when prune is interrupted, the would-be-deleted revision files remain on the storage but now refers to chunks that have been deleted. The solution has been suggested but is yet to be implemented. To cleanup you shall run check with persist flag, collect list of affected snapshots, delete them manually, and then run prune with exhaustive flag to clear out remaining orphaned chunks.

That’s the only known issue that can result in seemingly corrupted revisions. When duplicacy makes a backup it only adds to the storage, it cannot corrupt existing backups. This however is not your case since you only have one backup done seemingly. Unless you manually deleted the revision history and it is now mismatches local cache.

What are parameters of your backup run? Subsequent backups are usually very fast because existing files are skipped, unless you specified -hash or it’s an initial backup. First backup is always slow.

The speed UI shows is effective speed and does not correspond to actual upload speed.

Furthermore, if you have a lot of files and not enough ram to fit metadata, traversing file metadata will be IO limited and can take ages. You can check the IO wait queues on your disks to see if that’s the case.

If you manually deleted the snapshots folder you shall also manually delete local duplicacy cache

Search the backup log for this chunk ID and confirm that it was uploaded. If it was, and no error was reported — I would contact b2 support because it would mean a catastrophic failure on their end. They did have similar unforgivable data durability failures in the past, so I would not put these type of issues past them. This however is to be expected from “value” storage provider.

To summarize: if you did not violate duplicacy assumptions: did not touch the same snapshot ID from two different places, did not use -exclusive flag, did not interrupt prune, there shall be no way to silently lose data. Your usecase as described is pretty straightforward: init repo, do backup, run check and/or restore. If you did something else in addition to— please share.

eric1 · 1 December 2024 22:24

Yes, I made one backup to B2 and tried to restore, which failed due to the missing chunks error.

I deleted the snapshots folder on B2 based on the advice in the Fix missing chunks thread, but I experienced the missing chunks error before doing this, so it couldn’t have been the cause.

Here are the backup run options from the log Options: [-log -d backup -storage Backblaze -threads 10 -stats].

Task manager shows 12GB of available RAM, so I doubt it’s a RAM issue. As far as disk IO, I do see reads from the disk. It looks like Duplicity is reading through all the backup files again (600GB+) and that’s why the backup is taking so long. I thought that this would be unnecessary since I had already completed an initial backup to B2.

What are the consequences of not also deleting the local duplicacy cache?

None of the assumptions were violated. Simple backup from a local drive to B2, and then a restore to an external drive, which resulted in missing chunks. The only other things I did prior to this were to test a smaller backup to B2 and then restore, which worked perfectly. I also played around with the GUI, creating and then deleting multiple storages and backups, and then changing the storage password from the CLI.

A search of the forums for missing chunks shows multiple people running into this problem, so it doesn’t seem like an unusual case.

Droolio · 1 December 2024 22:35

@eric1 OP Do NOT do this else you’ll be uploading 600GB+ chunks from scratch! -exhaustive will remove unreferenced chunks and since you (now) don’t have any snapshots that reference those chunks, they’ll go bye-bye!

Yes but you deleted the snapshots folder, which is functionally equivalent to running an initial backup again, or with -hash. Duplicacy doesn’t know what chunks to skip unless it indexes the chunks on the storage, then matches them up with the source. Since there’s now no previous snapshot to look back to, it has to hash every single file to calculate those hashes.

It’s not re-uploading chunks, but it is chunking your files locally, seeing where chunks already exists, then skipping. Incremental backups are faster because it looks in the last revision, figures out what files have changed, and only hashes those.

You’ll have to run the backup 'til you at least have a revision 1, and only then should you do an -exhaustive prune.

saspus · 2 December 2024 00:04

The advice ther was do delete affected revisions, not entire snapshot id.

Technically, should be none, but there were cases where stale cache caused phantom check fails:

Then (after clearing cache to get throw out of the way) I would track down lifecycle of one of the affected chunks in all logs. Literally, grep for its name, see when it was uploaded, deleted, and checked for, and whether it’s actually missing on the target.

Good advice, to save re-upload of all the data.

gchen · 3 December 2024 13:22

Can you post a segment of this log file? A hundred lines should be enough for analysis.