Missing Chunk during copy command only

steve.francia · 12 July 2022 21:41

A bit of a follow up post to Remote revisions with same ids already exist after local drive crash

I did what was suggested in that thread and it largely worked. Except the copy command (Copying from Wasabi to local disk) is exiting with an error.

2022-07-12 17:33:34.552 ERROR DOWNLOAD_CHUNK Failed to download the chunk 923473dc318a9bef97c55776deafe406f1a62044cb0ad25f9579457d26778712: InternalError: We encountered an internal error.  Please retry the operation again later.
	status code: 500, request id: 6BC08F1A3E313841, host id: dDusgOJtjib3Ik9WncHhoXsAgCLy3KTxsEvmGOuKNvZywnX9NSTS5vsiw9NQLBEPN0E8P5KNYU+h
Failed to download the chunk 923473dc318a9bef97c55776deafe406f1a62044cb0ad25f9579457d26778712: InternalError: We encountered an internal error.  Please retry the operation again later.

Sadly this has been happening for the past week or so always with the same missing chunk. And when it encounters this error it exits.

Oddly, running check identifies that all chunks for all revisions are present and accounted for.

I’ve read through the forum and the documentation and I’ve not come across someone with a similar situation. In all other instances I found, check identified the missing chunks.

saspus · 12 July 2022 23:48

This looks like wasabi server returned diagnostic. Welcome to the club (of people that struggled with wasabi reliability and ultimately eventually gave up)

steve.francia · 12 July 2022 23:49

If the problem is Wasabi, what do people use as good alternatives that won’t break the bank?

saspus · 13 July 2022 00:30

You can try using different endpoint – for me us-east-1 was way more stable than us-west-1.

But otherwise I would wait for the Duplicacy to support archival storage and move to Amazon AWS. Using hot storage for backup is unnecessarily expensive even with discount providers like Backblaze and Wasabi.

gchen · 13 July 2022 00:34

This is a wasabi issue. The file is there when you list the directory, but for some reason they can’t retrieve the file for you. Contact their support and if you don’t hear from them in a day please let me know.

steve.francia · 13 July 2022 00:44

I’m using us-east-1.

I have just now reached out to support.

Thanks for the guidance.

steve.francia · 15 July 2022 12:12

It was indeed a Wasabi issue. After a few rounds of back and forth here was the response from their support staff.

backup/chunks/92/3473dc318a9bef97c55776deafe406f1a62044cb0ad25f9579457d26778712, 
we have discovered that this and two other objects in your bucket were affected by a defect. Please allow me to elaborate on this below-

On 10 Feb 2022, a new Wasabi maintenance release was deployed to all Wasabi storage regions. Among the capabilities that were included in this maintenance release was a change to improve the efficiency of how Wasabi writes data to storage disks. After this deployment, Wasabi received a small number of customer reports of errors that occurred when trying to access previously uploaded objects. The error typically seen by customers was:
HTTP Response Code: 500Request Error: We encountered an internal error. Please retrythe operation again later.Request Internal Error: InternalError
Upon investigation, the problem was isolated to certain objects that had been written to the system between 10 Feb 2022 (the date when the new maintenance release was deployed) and 16 Feb 2022. Additional investigation revealed that the affected objects were written during a specific period when the system was switching between internal interfaces that provide access to our storage servers.

Due to a problem with the new code designed to improve efficiency, the system indicated that it had written the entire object successfully (a 200 OK was provided to the application sending the object) when in fact one portion of the object was not written correctly. This problem will result in an HTTP request failure when the application attempts to read/recover the entire object. After isolating the problem, a configuration change on an internal parameter was made on 16 Feb 2022 that essentially disabled the new code that was introduced in the maintenance release that had been deployed on 10 Feb 2022. After this configuration change was made, no new reports of this issue have been received.

Our system logs indicate that you have data that was impacted by this incident. Specifically the total of 3 objects. Below are the details on the two other objects -
backup/chunks/c8/9b9bede2c6402076405715fcbb084226a105dde5e8fd0620e7e120d3b4e5ab 
backup/chunks/33/6ce33a338fff6a8239e8a162cb9d8b3e22f302b6a464c2a24eb99fbe30a650
The suggested course of action here is to rerun the full upload job that had the above objects inside.

This certainly calls into question their reliability and service. It’s pretty upsetting that they didn’t tell customers of a known problem until after the customers themselves realized that they had experienced data loss. From the response provided, they can clearly tell from their logs the affected customers.

However, given where I’m at, I’m a bit clueless as to the path forward. Fortunately I believe I still have all important origin files so I should be able to recreate the backup as needed. I’m just not clear on how to recreate the missing (but not missing) chunks. It’s also not clear to me how to identify which snapshots contain the missing chunks.

I’m also thinking that my best path forward is to buy a second external drive and backup to a local drive, then copy to another local drive then to Wasabi… at least until Duplicacy supports archival storage.

I currently have all but 32 chunks downloaded from Wasabi so it might be time to reverse my backup order, though I have concerns that I’ll never be in complete sync which means that my backups aren’t really backups.

Any advice would be sorely appreciated. I recognize that this all falls into the non-standard operation category due to the Wasabi failure.

Droolio · 15 July 2022 13:46

If you wish to keep using Wasabi, the good news is that Duplicacy storages are generally easy to ‘repair’ - at least as far as Duplicacy is concerned (and continued reliability implied) - without the need of a full re-upload. Although the process is very manual…

Personally, I’d start by renaming those two chunks on Wasabi, to append something like .bad to the filename.

When you next run a check, it should identify missing chunks and the logs should tell you what revisions are affected. You may need to combine this check with -persist (although it’s not very well documented; hopefully this gives a complete list rather than just the first occurrence).

If the affected revisions are not the most recent backup for each ID, the easiest thing to do would be to delete (or rename to .bad) each affected revision number on Wasabi in snapshots. Run another check and then you should be able to complete a Wasabi-to-local copy job. This kinda depends on how many revisions are affected.

If the affected revisions involve the most recent backup, you may be able to ‘recreate’ the bad chunks simply by running a fresh backup, but Duplicacy won’t do so with your existing snapshot IDs. You’ll either have to temporarily rename the affected revisions or the whole snapshot ID folder out of the way, OR (I recommend this) create a temporary new backup ID and complete a backup. Most chunks will be skipped on upload.

An initial backup with a new ID or a situation where Duplicacy doesn’t see previous metadata for the affected revisions, will force Duplicacy to upload the chunk (since it doesn’t exist on disk, or in the previous snapshots’ metadata).

There’s a possibility the chunks won’t get re-uploaded and, while there might be way to recreate them (using a temporary copy of the restored files closer to when the snapshot was made), simply deleting affected revisions might be easiest. In my experience, the damage is usually contained to a few revisions and this is usually the best approach.

If all else fails, switch to new backup IDs permanently and manually purge everything else that’s broken. Cleanup with prune -exclusive and continue to verify backups.

And indeed, a process of backing up to a local storage first and then copy to Wasabi, would be a much better backup strategy. (You can switch to this system easily, without having to reinitialise, but defo fix your Wasabi storage first.)

steve.francia · 19 July 2022 05:21

Ok. I thought I understood this, but now I’m getting an error that surprised me.

Here’s what I did.

Logged into Wasabi and renamed the three bad chunks to add the extension bad
Ran a check on that repo to identify which revisions were impacted
1 missing chunk was isolated to only 3 of the oldest snapshot of a backup.
2 missing chunks were in every snapshot of a backup
Logged into wasabi and renamed the 3 affected snapshot to add the extension .bad
Went into the duplicacy.json file and renamed the id of the backup that impacted every snapshot
Ran a full backup on the newly named backup (photo-2013-fix). Backup said it was completed and successful .

2022-07-18 22:08:32.825 INFO BACKUP_END Backup for /share/photo/2013 at revision 1 completed
2022-07-18 22:08:32.825 INFO BACKUP_STATS Files: 31465 total, 278,598M bytes; 31465 new, 278,598M bytes
2022-07-18 22:08:32.825 INFO BACKUP_STATS File chunks: 56714 total, 278,598M bytes; 367 new, 2,569M bytes, 2,567M bytes uploaded
2022-07-18 22:08:32.825 INFO BACKUP_STATS Metadata chunks: 5 total, 11,099K bytes; 5 new, 11,099K bytes, 6,336K bytes uploaded
2022-07-18 22:08:32.825 INFO BACKUP_STATS All chunks: 56719 total, 278,609M bytes; 372 new, 2,580M bytes, 2,574M bytes uploaded
2022-07-18 22:08:32.825 INFO BACKUP_STATS Total running time: 01:00:21

At this point I figured I’m doing ok. It seems that the latest snapshot of every backup should have all chunks. I really only care about the latest snapshot anyway (though the older ones would be nice, I’m ok with losing them).

I ran another check to ensure that everything was ok. Instead of seeing everything was ok, I got the following error message

2022-07-18 22:57:31.078 WARN DOWNLOAD_CHUNK Chunk 7d0654401fe47a064a1d59a4619b23e68ac3de633d2f5a212f2391001912d568 can’t be found
2022-07-18 22:57:31.682 ERROR SNAPSHOT_CHUNK Failed to load chunks for snapshot photo-2013-fix at revision 1: invalid character ‘f’ in exponent of numeric literal
Failed to load chunks for snapshot photo-2013-fix at revision 1: invalid character ‘f’ in exponent of numeric literal

Droolio · 19 July 2022 12:55

Personally, I wouldn’t have done step 6 - directly editing the duplicacy.json.

Safer bet, would be to just add a new backup with that ID, using the web GUI. Partly because you don’t know what it’s doing under the hood. The staging areas, under .duplicacy-web\repositories\localhost\0, 1 etc., might contain a stale cache.

Either delete the cache directories under each staging area, or recreate a backup ID using the web GUI, which should result in a clean cache.

Otherwise, maybe @gchen gchen could explain that error - not seen that error on the forum before.

steve.francia · 19 July 2022 13:40

After I did this I reset the duplicacy.json file back to the original and added a new ID using the web GUI, figuring that would be safer. Perhaps because of the initial edit the problem was already cemented, but I ended up getting the same error with the new backup too.

I just took the extra one out of the file entirely and ran a check and it worked. I don’t know what I did wrong, but it’s nice to see a successful check after 3 weeks of not having one.

gchen · 19 July 2022 15:14

I would suggest deleting ~/.duplicacy-web/repositories/localhost/all and run check -chunks again. This is to clear the cache and force check -chunks to verify all chunks again.

steve.francia · 25 July 2022 01:04

Thanks for all the help. I was able to sort this all out and my backup repos are now in good shape!

system · 4 August 2022 01:05

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.