Failure Validating Backups

I would like to confirm that my backups being stored in Wasabi are valid. I have been running successful daily backups, checks and prunes for the last month or so, but now decided to run a more intensive file check to actually validate the backup sets:

-log check -storage main-nas-backups -r 6 -stats -files -a -tabular

I chose revision 6 because I thought I would start with a smaller set. (EDIT oops…just realized that -r 6 and -a are probably conflicting. That was a mistake, but not sure it changes the problem)

I started the run via the UI which displayed a “1/28” progress bar, began to use a lot of bandwidth and then ran for about 13 hours. It ended with the following message:

2020-02-03 23:02:53.386 FATAL DOWNLOAD_CHUNK Chunk ...snip...6f32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43 can't be found Chunk ...snip...6f32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43 can't be found`

I found this surprising because my daily backups and checks do not indicate there any problems. At this point, I have very little confidence in the integrity of my offsite backups and would appreciate some guidance on the best next steps.

I am willing to abandon this set and create a new one if required, but Wasabi will charge more for this. All I want is to ensure I have a valid backup, whether I can keep the existing backup data or not. I have a fairly simple setup (one backup set in one location, simple pruning, think home nas backups) so I don’t know where this process has gone wrong.

The full error:

Options: [-log check -storage main-nas-backups -r 6 -stats -files -a -tabular] 
2020-02-03 10:18:01.902 INFO STORAGE_SET Storage set to wasabi://[us-east-2@s3.us-east-2.wasabisys.com/...snip.../nas-backups 
2020-02-03](http://us-east-2@s3.us-east-2.wasabisys.com/...snip.../nas-backups%0D
2020-02-03) 10:18:02.875 INFO SNAPSHOT_CHECK Listing all chunks 
2020-02-03 10:19:07.755 INFO SNAPSHOT_CHECK 1 snapshots and 28 revisions 
2020-02-03 10:19:07.764 INFO SNAPSHOT_CHECK Total chunk size is 1541G in 327379 chunks 
2020-02-03 10:19:51.497 INFO SNAPSHOT_VERIFY All files in snapshot bid-1 at revision 1 have been successfully verified 
2020-02-03 23:02:53.386 FATAL DOWNLOAD_CHUNK Chunk ...snip...6f32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43 can't be found Chunk ...snip...6f32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43 can't be found

A check run where it looks like everything is fine:

Running check command from /cache/localhost/all 
Options: [-log check -storage main-nas-backups -tabular -a -tabular] 
2020-02-03 00:08:24.031 INFO STORAGE_SET Storage set to wasabi://[us-east-2@s3.us-east-2.wasabisys.com/...snip.../nas-backups 2020-02-03](http://us-east-2@s3.us-east-2.wasabisys.com/...snip.../nas-backups%0D
2020-02-03) 00:08:24.398 INFO SNAPSHOT_CHECK Listing all chunks 
2020-02-03 00:09:34.735 INFO SNAPSHOT_CHECK 1 snapshots and 28 revisions 
2020-02-03 00:09:34.750 INFO SNAPSHOT_CHECK Total chunk size is 1541G in 327379 chunks 
2020-02-03 00:09:34.778 INFO SNAPSHOT_CHECK All chunks referenced by snapshot bid-1 at revision 1 exist 
2020-02-03 00:09:35.910 INFO SNAPSHOT_CHECK All chunks referenced by snapshot bid-1 at revision 4 exist 
2020-02-03 00:09:37.899 INFO SNAPSHOT_CHECK All chunks referenced by snapshot bid-1 at revision 5 exist 
2020-02-03 00:09:39.778 INFO SNAPSHOT_CHECK All chunks referenced by snapshot bid-1 at revision 6 exist
...snip...

Some details around the backups:

snap | rev | | files | bytes | chunks | bytes | uniq | bytes | new | bytes | 
bid-1 | 1 | @ 2020-01-05 13:14 -hash | 3353 | 489,718K | 100 | 410,896K | 7 | 12,524K | 100 | 410,896K | 
bid-1 | 4 | @ 2020-01-07 17:34 | 437769 | 991,081M | 172677 | 828,858M | 39 | 74,409K | 172584 | 828,469M | 
bid-1 | 5 | @ 2020-01-09 00:00 | 451230 | 1719G | 323455 | 1526G | 21 | 44,802K | 150817 | 734,279M | 
bid-1 | 6 | @ 2020-01-11 00:00 | 451238 | 1719G | 323477 | 1526G | 38 | 56,174K | 43 | 65,036K |
...snip...

My prune parameters

> Running prune command from /cache/localhost/all 
> Options: [-log prune -storage main-nas-backups -keep 0:1800 -keep 90:730 -keep 30:365 -keep 7:180 -keep 3:90 -threads 8 -a] 

Thanks for the help.

Running : web v1.1.0, cli v2.3.0 in docker

When you browse the files under the chunks directory on the Wasabi website, do you see the file 6f/32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43?

There are 10 sub-directories in the chunks directory (00 through 09). I thought the chunks that started with 00 would be under the 00 directory, 01 under 01, etc…but that does not seem to be the case.

I ran through all of them and I did not see any hashes that start numerically with the missing chunk hash number.

  • Perhaps prune did something to that chunk? (I can check the logs)
  • Is there way to reconcile what chunks have gone missing? Am thinking like “ok, lets bring the current state of things back to normal by replacing whatever chunks have gone missing.” Kind of reset the state to whatever is current.
  • And as a side note, shouldn’t the normal check (daily…without the -a) have reported that that chunk has gone missing?

Are you sure you only see 10 sub-directories under chunks? Given the size of your storage there should be 256 sub-directories (from 00 to ff).

You can follow the instructions in Fix missing chunks to see if the prune command deleted some chunks. But it looks more like a Wasabi issue if you only see 10 sub-directories.

Also, run a check command now to see what it says.

Lol…yes, there are 256 subdirs. The UI kind of hid the pagination controls at the bottom.

There is a “search objects by prefix” dialog box in Wasabi, but when I put any hash number (or first few numbers even), it comes up with no results found…even for items I know are there. I’m not going to search through 256 directories for it by hand :stuck_out_tongue: (oh…it looks like it only searches on whatever directory you’re in…not recursively…sigh)

I’ll follow up by running another check tonight and reading the missing chunks documentation. Every day I do a backup, a check and a prune in sequence so I would expect the check will pass.

I kind of feel like a fresh restart might be best at this point…eat the wasabi (ha) costs for now.

The chunk 6f32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43 should be under 6f with the file name 32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43. The first two bytes are used as the directory name while the rest is the file name.

Ah, I see. I thought that slash you added earlier was a typo.

In that case I do see the chunk. Here is the exact hash:

874335f33156f32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43

was listed as 87 / 4335f33156f32a8d8495dc0e3256be272d87f6cafacc665f618ae50437ad43

I wonder why it couldn’t find this in the full check?

Wasabi seems to have temporary errors/issues every once in a while. They haven’t publicized any known issues in us-east-2, but they do have some they’re currently working on in us-east-1.

https://status.wasabi.com/

1 Like

Do note, that -stats implies -all (and -tabular too, because that implies -stats), so yes it probably is overriding -r 6. If you’re doing -files, I’d strongly suggest not to use -all, -stats or -tabular because it’ll waste a lot of bandwidth downloading all the chunks for revisions 1, then revision 2, etc… And really, you don’t need the summary at the end for this test.

A better test imo would be to do an actual restore to a temporary location. Such a restore is not unlike check -files except it actually restores the files, and should let you resume in case of an error.

(An even more efficient procedure might involve first copy the remote storage to temporary local, because then chunks are only downloaded once… if you have a lot of de-duplication, a restore/check may download chunks multiple times, whereas copy won’t. Of course, this is if you have the spare disk space.)

1 Like

Does it always fail at the same chunk?

1 Like

Wasabi seems to have temporary errors/issues every once in a while.

Good point. It might be connectivity based…I can test that to see if it happens again.

-stats implies -all

I think I was pretty off in getting the check to do what I wanted. :slight_smile:

A better test imo would be to do an actual restore to a temporary location
An even more efficient procedure might involve first copy the remote storage to temporary local

Those are great ideas, just need the space to do it. It would separate the possible Wasabi connectivity issues from the integrity issues. I think I might have some kind of S3 file tool that I can use to download the entire bucket.

Does it always fail at the same chunk?

I have not tried the operation again, and my latest overnight backup and check complete without error. I think I’ll take @Droolio advice and just download it all. If it fails locally…well, something went bad…otherwise its Wasabi.

@Droolio I ended up using a s3cmd like tool to copy my entire Wasabi bucket to a local drive. Perhaps I should have used the copy command instead, but do you happen to know if I can “link” the local copy into duplicacy now so I can run a check?

Something like import the local set and then run check, or maybe start a new copy from S3 to the location that already has all of the S3 files locally, with the hopes it will skip files it sees are already copied.

Thanks

Although you could have used the copy command yes, it may have failed to download the whole lot due to missing / corrupt chunks, so you probably did the right thing there. The good thing with copy though, is you can be selective about what snapshots you want.

Now you just need to add the newly downloaded local storage to your configuration - in the Web UI > Storage > [+] button at the bottom. Give it a different name. Since it’s effectively a bit-identical copy, it’s also copy-compatible, so you could do a final copy from Wasabi before doing a local check or restore, if you wanted.

1 Like

Wanted to give an update here and maybe get a some advice.

Ended up using rclone (what a great tool) to pull all of the chunks from wasabi to my local environment. I added the backup location into the GUI, did a simple check and then started a restore. All files were restored and the process completed successfully. :+1:

I guess this means that something at Wasabi caused duplicacy to throw the error about not being able to find a chunk. (unfortunately, not the first time I had some problems with connectivity to Wasabi). A couple of observations:

  • Given that checks in duplicacy are not recoverable and the full check process is somewhat unreliable (for me) due to the issue above, it could be prudent to keep a local copy of the backups to run checks against. (seems like a bit of wasted space).

  • Thinking about just starting over again and backing up directly to a local drive and then using an rclone job to sync into Wasabi. I know that I could use duplicacy copy, but given the sensitivity to errors, it might make sense to use a different tool for the transfer job.This would allow me to do full checks and restores without protocol or transfer issues. This could also be done by just continuing to sync what I have already pulled from Wasbi, though that has an extra transfer step involved (nightly backups up and then sync’d back down)

  • I did not do a -files type hash check. Does a restore actually do any kind hashing in the process of extraction? Just wanted to make sure what is being restored is the same as the original file it came from.

At least now I know the backup files in Wasabi are valid. Thanks!

1 Like