Cipher: message authentication failed

So the idea is that in a situation like mine, you would manually delete the corrupted chunks in order to get them re-up loaded (hopefully)?

That sounds like an improvement (though from a user perspective, it’s obviously sub-optimal. But I suppose from a developer-perspective this is the easiest option).

If those corrupted chunks were corrupted because duplicacy did something wrong or if duplicacy could do something to prevent the corruption, I would see your point. But since neither of this seems to be the case, I think you’ll have to elaborate the reasoning on this point.

How exactly would I do this?

So is this something you can implement any time soon?

I’ll work on this next week.

OK, Let me know when it’s implemented so I’ll try it out with my broken chunk. Or wait. As I look at this again, I realize that this probably won’t solve my problem in this case, because I have already tried this:

and the missing chunk was not recreated. So the new feature probably wont lead to a different result.

So I guess the only solution is to

Is this correct, @gchen?

I was able to fix the problem by

  1. manually deleting the corrupted chunk directly on the storage
  2. running check
  3. searching the check-log for “missing” and noting down which revisions in which snapshots had missing chunks
  4. manually deleting those snapshots on the storage.
  5. running check again to confirm that it worked

So while this worked, I’m not sure if there is a more professional way of doing it without fiddling with the storage directly. If there is, please post the instructions below. Otherwise I will get back to this post and follow my own instructions whenever this happens again…

But even though I now have relatively easy instructions to follow (at least in cases with only one or a few chunks affected), I still think duplicacy could do better in helping the user resolve corrupted/ missing chunks.

Does anyone have comments on what I said here:

I didn’t explain my reasoning well but here is my point: corrupted chunks should only happen rarely and I don’t want to provide a convenient option to fix rare situations because such an option can be easily misused causing even more damages.

Is this pcloud? If this happens again you should enable Erasure Coding.

Yes, this is pcloud. Even if this only happens once a year, it is enough of a nuisance to want to fix it.

Are you saying that pcloud supports Erasure Coding or is Erasure Coding provider independent? But enabling erasure coding means I’m basically starting over and lose my previous backups, right?

While I have encountered a corrupted chunk on a rare occasion or two before, and had to take care to delete them manually, more frequently I’ve had situations where Duplicacy/Vertical Backup has created a lot of 0 byte chunks/snapshot files.

Usually this is as a result of out-of-disk space. Now obviously I should monitor disk space and alert before this becomes a problem, but one of the other issues I’m dealing with on one header-less system is Duplicacy (CLI) sometimes not pruning enough of what it should. I can’t yet pin-point why this issue is occurring; only that logs fill up with 'Marked fossil ’ and nothing else (no deleted fossils, no fossil collection saved at the end). As a result, the disk fills up after a couple months and I have to perform an -exclusive -exhaustive prune and everything is right again. Lots of disk space freed, pruning starts working again.

IMO Duplicacy needs an ability to self-heal a backup storage - particularly with 0-byte chunks/snapshots - and possibly provably bad chunks. Otherwise the user needs to manually tinker with said storage - an added pain if it’s cloud-based, since a different tool is necessary - and mistakes can happen.

Instead of deleting bad chunks, could it not rename them to .bad when the user, specifically, asks it to -heal during a check operation?

Then, after a heal, the recommended strategy might be to run the next backup with a new flag (saw it mentioned recently) that skips loading the chunk cache from the previous backup, and doesn’t assume all chunks are present and forcefully checks before each attempt to upload.

This is better than changing the snapshot ID to force a complete re-upload and optionally changing it back, as the user may want to keep the same ID. And better than the user manually deleting chunks from a storage in order to get back up and running again.

Also, the logging related to whether a chunk is file data or metadata could be improved, as the possible remedies change depend on which is which.

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

@gchen Is this still planned? (I’m having the same issue again… )

Sorry I didn’t implement that. I’ll do it after the memory optimization PR is merged.

1 Like

4 posts were merged into an existing topic: Memory Usage

Any news? I’m seeing Cipher: message authentication failed again…

This hasn’t been done. I would suggest creating a new pcloud storage with Erasure Coding enabled and then copy over everything. This way you won’t lose any old backups.

My backend is 4TB of which 3plus TB are used, so I can’t copy the entirety storage into the same backend without deleting the source (which I don’t think the copy command does). So I would have to download the entire storage then do this

And then reupload it all. Is that correct? It would only take a couple of months, I guess.

I was going to use my previous solution to fix another one of these error messages but I’m failing at step 3:

The problem is that there there is no instance of “missing” in the logs of the check -v
command. All i have at the end of the log is:

2024-02-02 00:44:16.582 ERROR DOWNLOAD_CHUNK Chunk 7136a6d409c040396ad421c40f3121e7f12122b1abe14e0f28d3176721235e04 can't be found
Chunk 7136a6d409c040396ad421c40f3121e7f12122b1abe14e0f28d3176721235e04 can't be found

@gchen Has something changed or what am I missing?

Edit: This post gave me the idea that I may have to delete the local cache. So I deleted the 71directory in ~/.duplicacy-web/repositories/localhost/all/.duplicacy/cache/pcloud_sftp/chunksand started a new check. Let’s see if that brings me back on track.

Edit2: Unfortunately, this didn’t change anything. So I’m still wondering how I can find out which snapshots I need to delete in order to get things up and running again.

Unfortunately, this is still currently a labourious process - event with check -persist - as I found out recently.

You have to use the verbose -v flag (or maybe even debug -d) to get details on what snapshot references the missing chunk. But this may only show one snapshot at a time.

-persist needs to be reworked to cater for all errors - including when it encounters missing metadata chunks, when prior metadata chunks still exist. i.e. when you get an error such as:

Failed to load chunks for snapshot backup at revision 123: unexpected end of JSON input
1 Like

That means the corrupted chunk is a metadata chunk and Duplicacy can’t construct the list of referenced chunks because of that. It doesn’t show the revision number, but usually it is the next revision right after the one that has been checked.

1 Like

Can we get check -persist to persist in these circumstances? As per: