Backup resurrects fossils for file-path storage but not for BackBlaze B2

ustulation · 26 August 2019 15:12

I was trying to see how prune works for b2. This was the scenario:

create a few files (set-1) in the src repo.
back these up as rev-1
delete set-1 and create new set of files - set-2 at src.
back these up as rev-2
delete set-2 and re-create set-1
back these up as rev-3
move snapshot file 3 somewhere else so duplicacy cannot find it (so does not know about rev-3)
run prune -r 1 which fossilises all chunks due to rev-1 and snapshot rev-1 is deleted. Fossil collection file reflects these
move snapshot file for rev-3 back to snapshots/id dir at the storage
I ran prune (like in step-12 below) here thinking it will resurrect everything as now it knows about a new rev-3 which references everything that rev-1 did - but nothing happened and exited saying that deletion criteria was not met. So basically it did nothing.
do one more backup do have backup at rev-4
run prune without any revisions given so it deals only with the local fossil-collection file that was previously created.

With local storage, duplicacy has the advantage to just rename the chunks with .fsl extension and resurrection means to just remove that .fsl extension. With b2, there’s no rename so probably does something with the hide-marker which would create one more version of that file (a b2 revision/version, not to be confused with duplicacy revisions).

Till here, I (hopefully) understand.

Now the difference is in step 11. Before this step (backing up for rev-3), the fossilisation in the file-path storage is shown as a .fsl file and subsequent backup (rev-4) resurrects it. I was surprised a bit here as i thought (or i read it somewhere) it would create new files as backup doesn’t look at fossilised files unlike restore and prune, but it’s a good thing ofc. if it does. The prune then has nothing to do so just exits saying chunks already exist.

For b2, the first prune did fossilisation by probably putting a hidden-marker. So i can see upstream that those chunk dirs are marked with * and in there it says there are 2 versions of the file. Step 11 does not resurrect them and in-fact I now have 3 versions (copies ?) of those files as shown online via their website. Only after i run step-12 do those reduce to containing just 1 file.

Why couldn’t the backup simply delete the hidden marker there which would behave similar to the file-path storage (resurrection) ? Till i run a subsequent prune it seems i might have 3 copies of a lot of files.

Also why did step-10 do nothing ? I’m suspecting this is something to do with timestamps and sorts ?

gchen · 27 August 2019 02:20

This is because the API to check the existence of a chunk file (specifically, b2_download_file_by_name with an invalid range) doesn’t return the file if there is a hidden marker. Uploading a new copy isn’t a big deal, as you shouldn’t run into this situation frequently.

This is because the prune operation will be delayed by 12 hours after a new review has been detected if the storage is not strong-consistent:

github.com

gilbertchen/duplicacy/blob/58387c095133f2c0500dab6b6164a0640c6dbff4/src/duplicacy_snapshotmanager.go#L114-L117


			extraTime := 0
			if !isStrongConsistent {
				extraTime = secondsInDay / 2
			}

ustulation · 27 August 2019 09:15

Cheers !

Just for my knowledge (because i’m a curious noob ):
For the strong-consistency part i searched a bit more and came across this in github. To summarise, prune could, in S3, fossilise by renaming a chunk file which S3 honours by copying the orig. with the new name and deleting the orig. file. The deletion is eventually consistent so a subsequent backup might still see the file and not re-upload it. A new prune runs, sees the new snapshot referring to the chunk and both fossilised and orig. file exist so ends up deleting the fossil. Later S3 eventually deletes the orig. too and now there is/are missing chunk(s). To prevent this use 12 hrs as a requirement for the new snapshot since the time prune was run in order to do the fossil deletion step.

My next questions would be then:

Why is this also applicable to local storage (step 10 behaved similarly for local and b2) ? b2 was a copy compatible of the local if it matters.
step 11 was not 12 hrs apart, rather a few minutes. So you mean prune only checks if there’s a new revision which is exactly one more than what it recorded in the fossil collection in order to delay the fossil deletion step ? If there’s more than 1 (which is this what step 11 did) it’ll go ahead and do the fossil deletion step anyway ? This would also hinder the eventually consistent systems right ?