Fix missing chunks

Oh, you are absolutely right. Now thinking about it that’s exactly what happened. Maybe Duplicacy should annotate the storage with which client last performed prune and clients would distrust cache if it wasn’t them?

I think the solution is to compare the timestamp of the cached copy with that of the file in the storage. However, due to an oversight in the design, the backend API doesn’t return the modification times when listing files in the storage (although most storages should support it).

Where do I find that file in a duplicacy-web install (on linux)?

Those preferences files are auto-generated in the web GUI so it is not recommended to modify them. If you want to change the repository id (which is called a backup id in the web GUI), just create a new backup with a new backup id.

1 Like

Is there a way to duplicate and modify an existing backup? In order to follow the above instructions I obviously also need to use the same filters…

You can edit ~/.duplicacy-web/duplicacy.json directly – find the backup in computers -> repositories and then change the id.

Changing that doesn’t change the backup ID in the UI. Will it still work?

Forgot to mention that you’ll need to restart the web GUI for the changes in duplicacy.json to take effect. Better yet, edit duplicacy.json while the web GUI is not running otherwise your changes may be overwritten.

1 Like

Can I restart the web-ui while a backup job is running?

Here is my reply from the other thread earlier today:

The CLI can be terminated any time and it shouldn’t leave any half-uploaded files on the cloud storage server, if the server behaves properly, because the content length is always set and the server should never store an incomplete chunk file shorter than the content length. OneDrive for Business is an exception but we’ve fixed that in the latest CLI release by using a different upload API.

For non-cloud storages like sftp and local disk, the CLI uploads to a temporary file first and then rename the temporary file once the upload completes. Aborting should cause any partial upload.

So are you saying that restarting the web-ui will stopp the cli but ir doesn’t matter?

BTW: you can quote text from other topics/threads. That will create links between those topics.

So I just waited for the backup to finish and then edited duplicacy.json, then restarted the web-ui. The new backup-ID showed up in the ui, and the backup went through without problems. But I don’t think it worked as intended because it uploaded tons of files that were supposed to be excluded (and which were excluded before renaming the ID). Might it be that renaming the repo in the .json file results in duplicacy displaying filters in the web-ui but not actually applying them?

There is also a case with B2 that the chunk exists, but there are multiple versions of it and the latest one is zero size. I did not find instructions what to do in this case.

Screenshot 2020-12-03 110902

SOLVED:
The solution seems to be to log in B2, locate the “missing” chunks and delete the zero-sized versions. I have no idea why those have been created in the first place, but I suspect an interrupted backup. I am not just sure does this method guarantee that the chunk content is valid anymore…

1 Like

I have been using Duplicacy for a couple years to back up several repositories to the same storage. All the repositories are on my computer and nobody else backs up to the storage. A couple days ago, I started getting an error when I run the check command (normally I don’t include -fossils but you’ll see why I’m including it in this case):

$ duplicacy check -a -fossils
Repository set to /Users/me
Storage set to b2://bucket-name
download URL is: https://f002.backblazeb2.com
Listing all chunks
17 snapshots and 1252 revisions
Total chunk size is 477,541M in 121572 chunks
All chunks referenced by snapshot usr-local at revision 1 exist
All chunks referenced by snapshot usr-local at revision 32 exist
...
All chunks referenced by snapshot Documents-other at revision 158 exist
All chunks referenced by snapshot Documents-other at revision 194 exist
Chunk aafaf71f51fa153647ad4266668c63c808439e3162b8a1d4888a93201549f425 can't be found

I checked the storage; the chunk is not in the “aa” directory of the “chunks” directory of the storage. Grepping for the chunk id in all the repositories’ log directories, I find:

Marked fossil aafaf71f51fa153647ad4266668c63c808439e3162b8a1d4888a93201549f425

The explanation on this page says

This is because another ongoing backup that was seen by the prune command may reference any of these chunks. To be safe, the prune command will turn them into fossils, which can be either permanently removed if no such backup exists, or turned back into normal chunks otherwise.

However I don’t see a corresponding log entry saying the chunk was permanently removed. (In contrast, the logs mention other chunks that have been permanently removed.) So I have two issues:

  1. If the logs say the chunk was marked as a fossil, but they don’t say it has been removed, shouldn’t it still exist?

  2. How can I determine which revision the missing chunk belongs to, so I can delete the snapshot as described above? The error message, as I have shown, does not give me a revision number.

I’m running CLI version 2.6.1 (ACEF01) on Mac OS 10.14.6 Mojave. Thanks in advance for your attention.

What is your B2 lifecycle setting? See Should I disable Backblaze B2 Cloud Lifecycle Settings?

If it is not set to keep all versions fossils may be deleted automatically by B2.

Thank you. My B2 bucket lifecycle setting was not keep all versions, and I corrected that. Just so I understand how this is relevant to fossils: Duplicacy marks a chunk as a fossil by moving/renaming it, but B2 doesn’t support move/rename, so a workaround is used that may lead to errors if B2 is not told to keep all versions – is that correct?

My other question is, what’s the easiest way to tell which revision(s) have missing chunks so I can delete them? The check command isn’t telling me.

Apparently a lot of chunks are missing from my backups (found out the hard way). I don’t understand why, the backups were running smoothly (finished without errors) and the prune command should remove entire revisions not single chunks from revisions (I can restore backups, but some chunks are missing).

When running duplicacy check -fossils -resurrect it stops at the first missing chunk, -persist makes it to skip errors.

@gchen Is there any command to prune all revisions which contain missing chunks? Half-working backups aren’t worth much, so I’d much rather have them removed to get an overview if there’s any fully working backup.

2 Likes

wow this sounds like a nice feature request.

2 Likes

I would not do that. Such an option would make users tend to remove problem revisions too quickly – in some cases missing chunks are fixable (such as those caused by a stale local cache).

3 Likes

Then combine this option with dropping the local cache before?

My problem was that half of my revisions had missing chunks, I deleted the local cache before, I had the ressurect option → I had to go through all revisions one by one and delete them manually, which was a lot of work. It also made me lose confidence in duplicacy, because this shouldn’t happen to such an extend‽ It’s not happening in newer revisions, so I hope it’s a bug in the past that got fixed.