Missing chunk, can't figure out why

I have scheduled check followed by prune on B2.

Check complained about missing chunks, which prune subsequently tried to fossilize but that chunk was apparently already a fossil. There is no other mention of that chunk anywhere. What could have happened?

alex@Tuchka:/var/services/homes/duplicacy-web/.duplicacy-web/logs$ grep -C 2 -R "cba6d3dbb864ec31270dcfb57025ec426dc414877c2fdcc7890410b43b5bbeae" .
./check-20200506-033001.log-2020-05-06 03:32:51.778 INFO SNAPSHOT_CHECK All chunks referenced by snapshot tuchka at revision 747 exist
./check-20200506-033001.log-2020-05-06 03:32:53.783 INFO SNAPSHOT_CHECK All chunks referenced by snapshot tuchka at revision 774 exist
./check-20200506-033001.log:2020-05-06 03:32:57.397 WARN SNAPSHOT_VALIDATE Chunk cba6d3dbb864ec31270dcfb57025ec426dc414877c2fdcc7890410b43b5bbeae referenced by snapshot tuchka at revision 776 does not exist
./check-20200506-033001.log-2020-05-06 03:32:57.620 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot tuchka at revision 776 are missing
./check-20200506-033001.log-2020-05-06 03:32:59.570 INFO SNAPSHOT_CHECK All chunks referenced by snapshot tuchka at revision 798 exist
--
./prune-20200506-034655.log-2020-05-06 03:48:49.054 INFO SNAPSHOT_DELETE Deleting snapshot tuchka at revision 891
./prune-20200506-034655.log-2020-05-06 03:48:50.480 INFO SNAPSHOT_DELETE Deleting snapshot tuchka at revision 892
./prune-20200506-034655.log:2020-05-06 03:59:54.253 WARN CHUNK_FOSSILIZE Chunk cba6d3dbb864ec31270dcfb57025ec426dc414877c2fdcc7890410b43b5bbeae is already a fossil
./prune-20200506-034655.log-2020-05-06 04:00:14.924 INFO FOSSIL_COLLECT Fossil collection 13 saved
./prune-20200506-034655.log-2020-05-06 04:00:15.081 INFO SNAPSHOT_DELETE The snapshot tuchka at revision 776 has been removed

The graph for some reason went into a bad state too:

There are another set of prune logs under /var/services/homes/duplicacy-web/.duplicacy-web/repositories/localhost/all/.duplicacy/logs, which should give more information on that chunk.

Hmm. I’m now completely confused

alex@Tuchka:/var/services/homes/duplicacy-web/.duplicacy-web/repositories/localhost/all/.duplicacy/logs$ grep -R "cba6d3dbb864ec31270dcfb57025ec426dc414877c2fdcc7890410b43b5bbeae" .
./prune-log-20200503-042409:Marked fossil cba6d3dbb864ec31270dcfb57025ec426dc414877c2fdcc7890410b43b5bbeae
./prune-log-20200506-034657:Marked fossil cba6d3dbb864ec31270dcfb57025ec426dc414877c2fdcc7890410b43b5bbeae
./prune-log-20200506-142217:Deleted fossil cba6d3dbb864ec31270dcfb57025ec426dc414877c2fdcc7890410b43b5bbeae (collection 13)

So, 05/06 it was marked fossil and then shortly after deleted, while it it definitely was used in one of the existing snapshots. Why could that be allowed to happen?

This is my interpretation: prune-log-20200503-042409 found this chunk was unreferenced after deleting some old revisions. At that time revision 776 was perhaps still in progress so prune-log-20200503-042409 didn’t know that this chunk was needed by revision 776.

But prune-log-20200503-042409 only marked this chunk as a fossil so it still existed in the storage. check-20200506-033001 saw revision 776 but it didn’t look for the fossil so it complained about the missing chunk (I believe at this time if you run the check command with -resurrect it would have recovered the chunk). After that prune-20200506-034655 came in and this time revision 776 was to be deleted and that chunk was again marked as a fossil.

So all is good. Since revision 776 has been deleted you don’t really need to worry about this chunk. But I think you can still recover this chunk if you want: being a fossil it is just hidden by b2_hide_file, so if you create a new dummy file with the same name on the b2 website then you should be able to see previous versions.

Sounds plausible. That backup dataset however does not change much – maybe that was a contributing factor.

I wish there was a way to not report harmless failures like that in the UI – seeing “check failed” is a bit unnerving.

Would that be remedied by running prune before check? It feels that its better to first check and then prune, no?

Add the -fossils option to the check job and this kind of errors won’t happen again. Maybe -fossils should be the default?

2 Likes

It might make sense to do that by default.

My understanding is that since fossilization is an inherent part of designs the state where a chunk refers to a fossil is not an exception or error but an expected behavior.

Therefore the datastore is not in a bad state — and hence the check should succeed.

I’m not 100% sure about this but it seems this way.

I’ve always wondered why -fossils is an option in the first place - why wouldn’t it be a good idea to always check for fossils if its non-fossil counterpart couldn’t be found?

Likewise, where would it be a bad idea to always run check with -resurrect too? Should it be avoided when running other jobs, and that’s why it’s an option?

1 Like

I saw this same scenario and was quite worried when I saw the “chunk does not exist” message until I came across this point.

I’m going to be using the --fossils option from now on, but I would vote for it being the default as that would reduce warning messages that don’t require action.