Race condition between backups and prunes can lead to corrupted snapshot

gmaxardis · 13 August 2024 17:59

Hello,

I am trying to grasp the way how duplicacy works. I have read the page on wiki and pdf with detailed explanation of how pruning works. But I am still unable to quite fully understand how it works and how is it able to work lock-free.

Explanation given is that prune deletes files in two steps: first renaming chunks to fossils, then actually removing only if there is a “new snapshot” and it is not referencing any fossils. This works because either snapshot is seen by “fossil collection” or “fossil deletion” step. I can see this being true if we run one backup + prune at the same time in any combination possible with the “policy” rules in place.

But I do not see it working with 2+ backups + prune running at the same time.

Let’s imagine that we start “fossil collection” and 2 backups at roughly the same time:

Fossil collection starts just before Backups so it doesn’t “see” them
Fossil collection sees that nothing references chunk A so it is marked as fossil (old snapshots get removed so nothing is referencing chunk A anymore)
Backup 1 doesn’t reference chunk A
Backup 2 does reference chunk A right before fossil collection marked is as fossil, so it doesn’t reupload the chunk
Fossil collection finishes
Backup 1 finishes, but Backup 2 doesn’t finish yet
We start fossil deletion. There is a new snapshot created by Backup 1 and it is after fossil collection finished. So we satisfy both checks in the “Policy 3”. There is a new snapshot that fossil collection didn’t see (Backup 1) and it finished after fossil collection finished.
So we delete chunk/fossil A completely
But wait… it is still referenced by Backup 2 that didn’t finish yet… oops?

What am I missing? As far as I can see this case means that we must not run multiple backups at the same time with the same “snapshot id” or we risk corrupting our backup. But isn’t this the whole point of duplicacy that we can run multiple backups at the same time?

Droolio · 13 August 2024 19:21

Afraid I didn’t quite follow your whole scenario to the T but I get the gist there’s some confusion surrounding the requirement of backups to complete between fossil collection and fossil deletion…

That is, there must be a round of backups from all snapshot IDs to occur, before those chunks which were fossilised, to then consider them deletable.

So, it doesn’t matter however many backups are running or complete - they must all complete at least once, for a prior collection to be in the clear. Chunks that need resurrecting get resurrected, everything else between can be deleted.

An exception is if a backup has run for 7 days, it’s then excluded from consideration (and you can specifically -ignore a snapshot during prune if you know what you’re doing; say if it’ll be the last backup you make with that ID, and you want to delete other chunks in the collection earlier). 7 days is long enough to account for a backup job lasting that long, which it ideally shouldn’t.

gmaxardis · 14 August 2024 07:19

I was afraid I might not be understood. It is quite difficult to describe what I mean in words.
So I decided to write a script to test my scenario with Backblaze B2 as storage.

Unfortunately my understanding is 100% correct and I am not missing anything.

Duplicacy DOES NOT work correctly!

I am able to corrupt my duplicacy backup and remove chunks which should not be removed and it is then impossible to restore the backup.

I am attaching my script and logs of it running on my PC
break-duplicacy.zip (2.3 KB)

I also have a B2 snapshot of the corrupted bucket but it is 150M big. Let me know if you want it. Hopefully you are able to reproduce the issue yourself with my script. And the code is clearer than my original wording of the scenario.

EDIT:

The way I see it the only way to fix this is to “disallow” multiple backups (with the same snapshot id) being run at the same time. And the way to do that in a lock-less way is to not overwrite snapshot revisions.

The issue arises in my script because backup2 overwrites revision3 after backup1 already created one. This needs to be disallowed. When revision3 already exists backup2 should see it and not write it’s own revision at all. This in turn means that backup2 is ignored, like it never happened. Not ideal but better than corrupting backup.

Or in a language of the PDF paper: I believe I just disproved “Theorem 3”. The proof is incomplete and doesn’t take into account that “backup B” might start while “backup A” is still running and end either before A finishes or after, or the other way around as well. The proof in paper only considers two cases A finishes then B starts or B finishes then A starts, and these are not all cases rendering the proof incorrect.

In order to prove Theorem 3 it needs additional “Policy 4” that “parallel backups” do not overwrite each others snapshot revisions

saspus · 14 August 2024 16:09

Each backup must have separate snapshot ID. One machine - one collection of files - one snapshot ID.

Hence the rest of the post is moot.

gmaxardis · 14 August 2024 16:24

Doesn’t make sense to me.
Each machine must have separate snapshot ID sure. But what about running two backups from the same machine (like I do in the script) at the same time by mistake or whatever. These should not have separate snapshot IDs.

saspus · 14 August 2024 16:34

Separate backups must of course have separate snapshot IDs.

Just think about it — what would “revision 7” mean if one snapshot ID contained revisions from different collection of files.

Duplicacy perhaps could implement some protection to prevent backup to a snapshot id, that received backup from a different backup set before; this is not done because this not really a problem if you follow manual, and increasing complexity for very niche corner case is not a good thing.

However lack of the protection against deliberate misuse is a far cry from

gmaxardis · 14 August 2024 16:43

Well depends on the definition of “separate backups”.

Do you mean that each backup run in parallel must have separate snapshot ID? If that’s the case then what if my cronjob or whatever automation tool runs two backups in parallel?

Or do you mean that each duplicacy init creates separate backup?

Or something else completely?

saspus · 14 August 2024 16:52

No, it nothing to do with what can be run in parallel — because you can run any operation concurrently with any other operation, not just backup.

I mean this:

Example backups to the same target:

Machine M1, folder F1, snapshot ID S11
Machine M1, folder F2, snapshot ID S12
Machine M2, folder F1, snapshot ID S21
Machine M2, folder F2, snapshot ID S22

You can run any backup (or check, orprune, …) concurrently with any other backup (or check, or prune, …) from this list. You can even run the same one in two instances in cron if that’s what you described — which is a silly thing to do, but no data loss will occur.

gmaxardis · 14 August 2024 17:29

One machine - one collection of files - one snapshot ID.

I have modified the script.

it is running off of one machine: my PC
it is running off of one collection of files: there is just one directory with files, there is just one duplicacy init. The collection of files changes, but that is normal(*) behavior, some files get added, other get removed.
and so it is one snapshot ID

The script is adding/removing/moving files from the same “collection of files”, so basically normal life of a disk in use with personal files. It runs two backups at the same time (imagine it being cron/systemd/etc that run the same job again). And it also runs prune at the same time cause, well, we just got unlucky and it was also triggered by some automation tool.

Same issue. Corrupted backup with missing chunks.

(*) Normal with asterisk because, sure, files shouldn’t change/move/add/delete so abruptly. But this should work either way since duplicacy should be race-condition-free - that’s the point of lock-free, smart algorithms etc.

break-duplicacy.zip (2.3 KB)

saspus · 14 August 2024 18:05

Now that’s interesting! I’ll review your scripts (thank you for sharing those — it’s the best type of bug report!) and try to reproduce tonight after work.

saspus · 15 August 2024 04:47

Very cool, so if I understand this right, it looks like completing the first backup out of the two concurrently running once satisfies the prunes’s check for the completion of the last backup and allows to mess with files that happen to be needed by the still-in-progress backup only.

(This, however, will result in missing chunks in that particular revision only, so the scope is limited to the very recent backup).

I guess the solution here would be to claim the revision with a placeholder file that will be replaced with actual manifest upon successful completion of the backup. This will make situation where two backup runs perform the same revision ID impossible. @gchen

gmaxardis · 15 August 2024 07:30

Yes, your understanding is correct.

The problem boils down to a fact that second backup overwrites the first backup’s revision file.
If this wasn’t the case then everything would be fine and things wouldn’t get corrupted.

I believe that the fix is to do the following:

Theoretical side

Fix PDF’s proof of “Theorem 3”. It only considers 2 out of 6 cases of interaction of A and B backups.

# These two are considered in PDF
Case A)
    Backup --------------------------------------------------------------
                   |    A     |          |    B    |
                   ------------          -----------

Case B)
    Backup --------------------------------------------------------------
                   |    B     |          |    A    |
                   ------------          -----------

# These are not considered in PDF
Case C)
    Backup --------------------------------------------------------------
                   |    A            |          |        | 
                   ------------------|-----------        |
                                     |        B          |
                                     ---------------------

Case D)
    Backup --------------------------------------------------------------
                   |    B           |          |        | 
                   -----------------|-----------        |
                                    |        A          |
                                    ---------------------

Case E)
    Backup --------------------------------------------------------------
                   |    A           |                   |         |
                   -----------------|-------------------|----------
                                    |        B          |
                                    ---------------------

Case F)
    Backup --------------------------------------------------------------
                   |    B           |                   |         |
                   -----------------|-------------------|----------
                                    |        A          |
                                    ---------------------

If the proof would consider latter 4 cases then it would find that at least one of these cases fails to prove the “Theorem 3”. The way to fix that is to add “Policy 4” with something along the lines:

Policy 4:
- Any backup must try to write a revision file with revision_no one greater than the last "seen" finished backup  # this is what software does and it is assumed in the paper but not stated explicitly
- Any backup must not (over)write a revision file if it already exists  # this is not what software does and hence the race condition / corruption, and is unclear whether it is assumed in the paper or not

With that additional Policy I believe the “Theorem 3” holds and is proven back again.

Engineering side

I guess the solution here would be to claim the revision with a placeholder file that will be replaced with actual manifest upon successful completion of the backup

This may work, yes, but let me propose another solution: create revision file in exclusive mode. In the world of C: fopen("/snapshot/revision", "wx"), or POSIX world: open("...", O_EXCL), or in shell: set -C # noclobber.

I assume the code is different for each backend (filesytem, S3 API, etc…), but hopefully it is possible to do in every backend duplicacy supports.

For example in the “filesystem backend” the code probably looks something like this:

createRevisionFile() {
    let f = fopen("/snapshot/revision", "w");
    fwrite(f, ...);
}

or

createRevisionFile() {
    let f = fopen("/snapshot/revision.tmp", "w");
    fwrite(f, ...);
    rename("/snapshot/revision.tmp", "/snapshot/revision");
}

And should be changed to:

createRevisionFile() {
    let f = fopen("/snapshot/revision", "wx");
    fwrite(f, ...);
}

or

createRevisionFile() {
    let f = fopen("/snapshot/revision.tmp", "wx");
    fwrite(f, ...);
    renameat2(..., "/snapshot/revision.tmp", ..., "/snapshot/revision", RENAME_NOREPLACE);
}

And in “S3 backend”:

createRevisionFile() {
    callS3API("createAndWriteFile", ...);
}

Should be changed to:

createRevisionFile() {
    callS3API("createAndWriteFileButOnlyIfNotExists", ...);
}

I never used S3 API so I am making up the API names above but hopefully I managed to convey my idea.

Droolio · 15 August 2024 10:43

You may want to revoke the B2 tokens you left in your script…

If I’m understanding correctly, an issue only arises when trying to concurrently run two or more backups on the same snapshot ID? This makes sense. Don’t. Do. That.

But sure, this can be done accidentally, but there are certain things you can already do to prevent this (on the same machine at least). Duplicacy GUI doesn’t let you run the same backup or schedule twice. Duplicacy CLI has a -comment global option - specifically for this scenario - so a script can identify an existing backup process and abort.

Now, maybe if Duplicacy could have some protection against this built-in, that would be good. Though there are other scenarios which could cause similar issues* - i.e. running a prune -exhaustive (let alone -exclusive) during a backup, that could also cause storage corruption. There may be others, but there are well-known rules to abide by.

Back to the topic of the two-step fossil collection… are we satisfied that it does indeed WORK - provided we use unique snapshot IDs and abide by those other rules?

* Edit: not to mention an aborted prune - which, as @saspus knows, deletes chunks before (not) deleting the revision file - which I’d argue is a much more common occurrence, and difficult to prevent!

gmaxardis · 15 August 2024 10:50

I instantly did. I realized it and removed the key immediately. Thanks for reminding me anyways though

gmaxardis · 15 August 2024 11:24

Depends on the definition of “WORK”. It does work, but not fully correctly. So I am personally not satisfied and I stand by what I wrote: “duplicacy does not work correctly”. There exists a race condition that can/should be prevented, which can lead to corrupted data.

My script shows that duplicacy can corrupt data even though it doesn’t do anything special really. It just runs two backups and two prunes and moves some files around. This should not lead to corrupted data. And it doesn’t have to because I believe there is a solution to that problem - one I described.

I also disagree with your paragraph about similar prune issues.

prune -exhaustive shouldn’t change anything as far as I understand how the algorithm works. It will work with 1 backup OK and will fail with 2+ backups the exact same way I described.

prune -exclusive can break but that is to be expected. By adding -exclusive flag you are explicitly opting out of the race-condition-free normal way of operation, and are risking corruption of data.

The above things are not at all similar to the issue I describe here.

run two or more backups on the same snapshot ID? This makes sense. Don’t. Do. That.

Doesn’t matter to me whether or not you should do that. That’s beside the point. You could say something along the lines of “don’t run backup and prune at the same time because it might break”. But that is the whole point of the lock-free algorithm of the duplicacy. It should allow you to run whatever operations you want (exception being prune -exclusive) in whatever order you want and everything should work because it shouldn’t have race conditions and it should not corrupt data.

And duplicacy is very good at doing that. As far as I can tell from reading the PDF, and assuming Go code follows the algorithm described in the PDF (assuming there is no discrepancy/bug), I can only see this one race condition (*), all others should be prevented by the lock-free algorithm. I am convinced by the proofs in the PDF, except for the proof of “Theorem 3” because of this particular case I am describing here. And I have even shown that it is not just theoretical talk - I am in fact able to reproduce the issue with my scripts.

(*) I do actually see a second smaller issue, and a potential relatively easy way to fix it (maybe), but I don’t want to confuse everyone by mixing separate topics together, so I am leaving it for later.

** Edit: not to mention an aborted prune - which, as @saspus knows, deletes chunks before (not) deleting the revision file - which I’d argue is a much more common occurrence, and difficult to prevent!*

Yes that is also a problem. It is a different problem. And it may even be a bigger problem in practice than the one I am describing here. But it doesn’t prove or deny that what I describe here is in fact a problem on its own. This is unrelated.

Droolio · 15 August 2024 12:39

I totally agree it’s worth exploring better safeguards, if one can be implemented, sure…

However, your OP questioned the “Two-Step Fossil Collection” algorithm (pruning) which is simply not the issue here - it’s a backup race condition you accidentally discovered while using the same snapshot ID. (Which is understood to be a bad idea in any case.)

There are ways to completely mitigate against this in any script with -comment, as I said.

My point was there are many rules which can be broken, and the storage can end up corrupted - if the user chooses to ignore those considerations (and there might be some situations which can’t be engineered to be safe-guarded against). Running prune on two separate computers, for example (which one might accidentally do in the GUI version on both clients). It’s explicitly recommended to do this on only 1 PC.

Not necessarily so. (Not the end of the world, as the storage isn’t truly corrupt, but you don’t want fossils already after a recent backup!)

100% I agree this could be mitigated, and your engineered solution looks sound.

A very similar solution could be implemented with aborted prunes…

Duplicacy should rename the revision file to revision.fsl before deleting chunks, or simply reverse the process - delete revision file(s) then chunks.

gmaxardis · 15 August 2024 20:24

That is incorrect. The “Two-Step Fossil Collection” algorithm is at fault here. It is not a race between backups. Ultimately it is a race between second backup and two prunes.

If the second backup would finish before first prune then everything would be fine - prune would never turn chunks into fossils.
If the second backup would start after first prune then everything would be fine - backup would reupload chunks.
If the second backup would finish before second prune then everything would be fine - prune would resurrect fossils.
But when the second backup runs at just a “perfectly bad” time then we get corrupted state.

And because of that race my solution is to “pretend” like the second backup never happened by not allowing it to overwrite the revision file.

There are ways to completely mitigate against this in any script with -comment , as I said.

I get it, you are right, sure. But it is like saying that you could mitigate every race by just using an exclusive lock on every duplicacy operation. But that’s missing the point. The whole point of duplicacy is to be lock-less and still work.

Exactly, you acknowledge the huge difference yourself right here. duplicacy -exhaustive might lead to “unnecessary” fossils. But that is very much different from corruption where chunks/fossils are completely gone. And so this is not at all similar case to what I am describing here.

Yes, I totally agree. The problem with removing revisions first is that then prune without -exhaustive will never clear chunks. The problem with removing chunks first is that when process gets interrupted then you end up with broken revisions.

So I believe the proper solution here is to remove revisions in a similar fashion to chunks - similar, though simpler, cause there is no need for resurrecting revisions, etc. First rename revision → revision.fsl, then remove chunks, then remove revision.fsl completely. Revision fossils should be visible for prune and should be invisible for list/restore/backup.

Droolio · 15 August 2024 23:06

I have to disagree here. The ‘fix’ is preventing a second backup revision from overwriting the first. Nothing about the prune algorithm (Two-Step Fossil Collection) needs to change wrt to this issue.

It’s not an exclusive lock, though. Not even a non-exclusive soft lock. It’s purely client-side, doesn’t block other backups or even a prune. Anyway, I think this is all semantic nitpicking…

There’s at least have a potential fix - although, having looked at the solution again, I strongly suspect creating revision files in some kind of exclusive mode won’t be feasible for all backends; if it were, Duplicacy might already be doing it for chunk files. In the case of local file and sftp backends, chunks are uploaded to .tmp files and renamed into place. A similar approach might have to be used on revision files, tweaked for each backend.

Indeed. And then any remnant .fsl(s) could be reprocessed and included again in a subsequent prune.

gmaxardis · 16 August 2024 07:49

Well at this point I guess we have to agree to disagree, cause I won’t find a way to convince to my pov.
“Two-Step Fossil Collection” algorithm correctness is based on 3 theorems, one of them states “The interplay of the backup procedure and the deletion procedure is concurrently lock-free.” - parallel backup and prune are safe and race-free. As it is right now theorem 3 is false and hence “Two-Step Fossil Collection” algorithm is broken.

Creating chunks in exclusive mode is less necessary. In contrast to revisions which use increasing natural numbers, chunks use hashes. Assuming that likelihood of hash collision is non-existent then even if you overwrite a chunk then very-very-most likely it got overwritten with the exact same data.

Even if it is not possible for all backends, then I guess we should do it for as many backends as possible. At least some of them will be resistent to the issue and others, well, unfortunately dont give us tools to. Or we might then come up with a different solution along the lines of what @saspus was thinking.

Coming back to the very origins of this topic to summarize:

It all started with me being curious how duplicacy managed to be lock-free in contrast to tools like restic which require exclusive lock on prunes
And so I have read, and reread docs like 10 times trying to understand it. I couldn’t quite grasp the idea how the “Two-Step Fossil Collection” works cause my intuition was telling me that the “Theorem 3” proof is incomplete and hence incorrect.
I assumed that I must just be missing something and not understanding, and so I created this thread
Then after experimenting turned out that it wasn’t a gap in my understanding, it was in fact a gap in the proof. This lead me to discover a serious bug which can lead to corrupted (unfixable, cause chunks/fossils are completely gone) backup revision.
And so the discussion switched from “help me understand the algorithm” to “algorithm is broken”
Even though we might disagree on some manners, I believe I was able to convince you that it is in fact a bug that is worth fixing, which makes me hopeful that it will get fixed

Summary of summary:

I think the outcome of this discussion is fruitful. There is a bug uncovered. After researching backup solutions I now know how duplicacy works. I learned something. I will be a happy duplicacy user from now on, cause I believe it to be superior to competitors. I just need to make double sure that there are never two backups running at the same time (with same snapshot id), using -comment or ps or what not in my automation script/cron. At least until the bug gets fixed.

I hope that the bug will get fixed. And with that I guess my last question is: should I open an issue in “GitHub issues”?

gmaxardis · 18 August 2024 07:45

should I open an issue in “GitHub issues”?

I have checked on GitHub, and it states to not open issues there, and instead write here on the forum. So I won’t open an issue there to not spam.

@saspus I can see you are the moderator, so could you change category of this topic from “Support” to “Bug” then? And also the topic itself to “Race condition between backups and prunes can lead to corrupted snapshot”