Race condition between backups and prunes can lead to corrupted snapshot

saspus · 15 August 2024 04:47

Very cool, so if I understand this right, it looks like completing the first backup out of the two concurrently running once satisfies the prunes’s check for the completion of the last backup and allows to mess with files that happen to be needed by the still-in-progress backup only.

(This, however, will result in missing chunks in that particular revision only, so the scope is limited to the very recent backup).

I guess the solution here would be to claim the revision with a placeholder file that will be replaced with actual manifest upon successful completion of the backup. This will make situation where two backup runs perform the same revision ID impossible. @gchen

gmaxardis · 15 August 2024 07:30

Yes, your understanding is correct.

The problem boils down to a fact that second backup overwrites the first backup’s revision file.
If this wasn’t the case then everything would be fine and things wouldn’t get corrupted.

I believe that the fix is to do the following:

Theoretical side

Fix PDF’s proof of “Theorem 3”. It only considers 2 out of 6 cases of interaction of A and B backups.

# These two are considered in PDF
Case A)
    Backup --------------------------------------------------------------
                   |    A     |          |    B    |
                   ------------          -----------

Case B)
    Backup --------------------------------------------------------------
                   |    B     |          |    A    |
                   ------------          -----------

# These are not considered in PDF
Case C)
    Backup --------------------------------------------------------------
                   |    A            |          |        | 
                   ------------------|-----------        |
                                     |        B          |
                                     ---------------------

Case D)
    Backup --------------------------------------------------------------
                   |    B           |          |        | 
                   -----------------|-----------        |
                                    |        A          |
                                    ---------------------

Case E)
    Backup --------------------------------------------------------------
                   |    A           |                   |         |
                   -----------------|-------------------|----------
                                    |        B          |
                                    ---------------------

Case F)
    Backup --------------------------------------------------------------
                   |    B           |                   |         |
                   -----------------|-------------------|----------
                                    |        A          |
                                    ---------------------

If the proof would consider latter 4 cases then it would find that at least one of these cases fails to prove the “Theorem 3”. The way to fix that is to add “Policy 4” with something along the lines:

Policy 4:
- Any backup must try to write a revision file with revision_no one greater than the last "seen" finished backup  # this is what software does and it is assumed in the paper but not stated explicitly
- Any backup must not (over)write a revision file if it already exists  # this is not what software does and hence the race condition / corruption, and is unclear whether it is assumed in the paper or not

With that additional Policy I believe the “Theorem 3” holds and is proven back again.

Engineering side

I guess the solution here would be to claim the revision with a placeholder file that will be replaced with actual manifest upon successful completion of the backup

This may work, yes, but let me propose another solution: create revision file in exclusive mode. In the world of C: fopen("/snapshot/revision", "wx"), or POSIX world: open("...", O_EXCL), or in shell: set -C # noclobber.

I assume the code is different for each backend (filesytem, S3 API, etc…), but hopefully it is possible to do in every backend duplicacy supports.

For example in the “filesystem backend” the code probably looks something like this:

createRevisionFile() {
    let f = fopen("/snapshot/revision", "w");
    fwrite(f, ...);
}

or

createRevisionFile() {
    let f = fopen("/snapshot/revision.tmp", "w");
    fwrite(f, ...);
    rename("/snapshot/revision.tmp", "/snapshot/revision");
}

And should be changed to:

createRevisionFile() {
    let f = fopen("/snapshot/revision", "wx");
    fwrite(f, ...);
}

or

createRevisionFile() {
    let f = fopen("/snapshot/revision.tmp", "wx");
    fwrite(f, ...);
    renameat2(..., "/snapshot/revision.tmp", ..., "/snapshot/revision", RENAME_NOREPLACE);
}

And in “S3 backend”:

createRevisionFile() {
    callS3API("createAndWriteFile", ...);
}

Should be changed to:

createRevisionFile() {
    callS3API("createAndWriteFileButOnlyIfNotExists", ...);
}

I never used S3 API so I am making up the API names above but hopefully I managed to convey my idea.

Droolio · 15 August 2024 10:43

You may want to revoke the B2 tokens you left in your script…

If I’m understanding correctly, an issue only arises when trying to concurrently run two or more backups on the same snapshot ID? This makes sense. Don’t. Do. That.

But sure, this can be done accidentally, but there are certain things you can already do to prevent this (on the same machine at least). Duplicacy GUI doesn’t let you run the same backup or schedule twice. Duplicacy CLI has a -comment global option - specifically for this scenario - so a script can identify an existing backup process and abort.

Now, maybe if Duplicacy could have some protection against this built-in, that would be good. Though there are other scenarios which could cause similar issues* - i.e. running a prune -exhaustive (let alone -exclusive) during a backup, that could also cause storage corruption. There may be others, but there are well-known rules to abide by.

Back to the topic of the two-step fossil collection… are we satisfied that it does indeed WORK - provided we use unique snapshot IDs and abide by those other rules?

* Edit: not to mention an aborted prune - which, as @saspus knows, deletes chunks before (not) deleting the revision file - which I’d argue is a much more common occurrence, and difficult to prevent!

gmaxardis · 15 August 2024 10:50

I instantly did. I realized it and removed the key immediately. Thanks for reminding me anyways though

gmaxardis · 15 August 2024 11:24

Depends on the definition of “WORK”. It does work, but not fully correctly. So I am personally not satisfied and I stand by what I wrote: “duplicacy does not work correctly”. There exists a race condition that can/should be prevented, which can lead to corrupted data.

My script shows that duplicacy can corrupt data even though it doesn’t do anything special really. It just runs two backups and two prunes and moves some files around. This should not lead to corrupted data. And it doesn’t have to because I believe there is a solution to that problem - one I described.

I also disagree with your paragraph about similar prune issues.

prune -exhaustive shouldn’t change anything as far as I understand how the algorithm works. It will work with 1 backup OK and will fail with 2+ backups the exact same way I described.

prune -exclusive can break but that is to be expected. By adding -exclusive flag you are explicitly opting out of the race-condition-free normal way of operation, and are risking corruption of data.

The above things are not at all similar to the issue I describe here.

run two or more backups on the same snapshot ID? This makes sense. Don’t. Do. That.

Doesn’t matter to me whether or not you should do that. That’s beside the point. You could say something along the lines of “don’t run backup and prune at the same time because it might break”. But that is the whole point of the lock-free algorithm of the duplicacy. It should allow you to run whatever operations you want (exception being prune -exclusive) in whatever order you want and everything should work because it shouldn’t have race conditions and it should not corrupt data.

And duplicacy is very good at doing that. As far as I can tell from reading the PDF, and assuming Go code follows the algorithm described in the PDF (assuming there is no discrepancy/bug), I can only see this one race condition (*), all others should be prevented by the lock-free algorithm. I am convinced by the proofs in the PDF, except for the proof of “Theorem 3” because of this particular case I am describing here. And I have even shown that it is not just theoretical talk - I am in fact able to reproduce the issue with my scripts.

(*) I do actually see a second smaller issue, and a potential relatively easy way to fix it (maybe), but I don’t want to confuse everyone by mixing separate topics together, so I am leaving it for later.

** Edit: not to mention an aborted prune - which, as @saspus knows, deletes chunks before (not) deleting the revision file - which I’d argue is a much more common occurrence, and difficult to prevent!*

Yes that is also a problem. It is a different problem. And it may even be a bigger problem in practice than the one I am describing here. But it doesn’t prove or deny that what I describe here is in fact a problem on its own. This is unrelated.

Droolio · 15 August 2024 12:39

I totally agree it’s worth exploring better safeguards, if one can be implemented, sure…

However, your OP questioned the “Two-Step Fossil Collection” algorithm (pruning) which is simply not the issue here - it’s a backup race condition you accidentally discovered while using the same snapshot ID. (Which is understood to be a bad idea in any case.)

There are ways to completely mitigate against this in any script with -comment, as I said.

My point was there are many rules which can be broken, and the storage can end up corrupted - if the user chooses to ignore those considerations (and there might be some situations which can’t be engineered to be safe-guarded against). Running prune on two separate computers, for example (which one might accidentally do in the GUI version on both clients). It’s explicitly recommended to do this on only 1 PC.

Not necessarily so. (Not the end of the world, as the storage isn’t truly corrupt, but you don’t want fossils already after a recent backup!)

100% I agree this could be mitigated, and your engineered solution looks sound.

A very similar solution could be implemented with aborted prunes…

Duplicacy should rename the revision file to revision.fsl before deleting chunks, or simply reverse the process - delete revision file(s) then chunks.

gmaxardis · 15 August 2024 20:24

That is incorrect. The “Two-Step Fossil Collection” algorithm is at fault here. It is not a race between backups. Ultimately it is a race between second backup and two prunes.

If the second backup would finish before first prune then everything would be fine - prune would never turn chunks into fossils.
If the second backup would start after first prune then everything would be fine - backup would reupload chunks.
If the second backup would finish before second prune then everything would be fine - prune would resurrect fossils.
But when the second backup runs at just a “perfectly bad” time then we get corrupted state.

And because of that race my solution is to “pretend” like the second backup never happened by not allowing it to overwrite the revision file.

There are ways to completely mitigate against this in any script with -comment , as I said.

I get it, you are right, sure. But it is like saying that you could mitigate every race by just using an exclusive lock on every duplicacy operation. But that’s missing the point. The whole point of duplicacy is to be lock-less and still work.

Exactly, you acknowledge the huge difference yourself right here. duplicacy -exhaustive might lead to “unnecessary” fossils. But that is very much different from corruption where chunks/fossils are completely gone. And so this is not at all similar case to what I am describing here.

Yes, I totally agree. The problem with removing revisions first is that then prune without -exhaustive will never clear chunks. The problem with removing chunks first is that when process gets interrupted then you end up with broken revisions.

So I believe the proper solution here is to remove revisions in a similar fashion to chunks - similar, though simpler, cause there is no need for resurrecting revisions, etc. First rename revision → revision.fsl, then remove chunks, then remove revision.fsl completely. Revision fossils should be visible for prune and should be invisible for list/restore/backup.

Droolio · 15 August 2024 23:06

I have to disagree here. The ‘fix’ is preventing a second backup revision from overwriting the first. Nothing about the prune algorithm (Two-Step Fossil Collection) needs to change wrt to this issue.

It’s not an exclusive lock, though. Not even a non-exclusive soft lock. It’s purely client-side, doesn’t block other backups or even a prune. Anyway, I think this is all semantic nitpicking…

There’s at least have a potential fix - although, having looked at the solution again, I strongly suspect creating revision files in some kind of exclusive mode won’t be feasible for all backends; if it were, Duplicacy might already be doing it for chunk files. In the case of local file and sftp backends, chunks are uploaded to .tmp files and renamed into place. A similar approach might have to be used on revision files, tweaked for each backend.

Indeed. And then any remnant .fsl(s) could be reprocessed and included again in a subsequent prune.

gmaxardis · 16 August 2024 07:49

Well at this point I guess we have to agree to disagree, cause I won’t find a way to convince to my pov.
“Two-Step Fossil Collection” algorithm correctness is based on 3 theorems, one of them states “The interplay of the backup procedure and the deletion procedure is concurrently lock-free.” - parallel backup and prune are safe and race-free. As it is right now theorem 3 is false and hence “Two-Step Fossil Collection” algorithm is broken.

Creating chunks in exclusive mode is less necessary. In contrast to revisions which use increasing natural numbers, chunks use hashes. Assuming that likelihood of hash collision is non-existent then even if you overwrite a chunk then very-very-most likely it got overwritten with the exact same data.

Even if it is not possible for all backends, then I guess we should do it for as many backends as possible. At least some of them will be resistent to the issue and others, well, unfortunately dont give us tools to. Or we might then come up with a different solution along the lines of what @saspus was thinking.

Coming back to the very origins of this topic to summarize:

It all started with me being curious how duplicacy managed to be lock-free in contrast to tools like restic which require exclusive lock on prunes
And so I have read, and reread docs like 10 times trying to understand it. I couldn’t quite grasp the idea how the “Two-Step Fossil Collection” works cause my intuition was telling me that the “Theorem 3” proof is incomplete and hence incorrect.
I assumed that I must just be missing something and not understanding, and so I created this thread
Then after experimenting turned out that it wasn’t a gap in my understanding, it was in fact a gap in the proof. This lead me to discover a serious bug which can lead to corrupted (unfixable, cause chunks/fossils are completely gone) backup revision.
And so the discussion switched from “help me understand the algorithm” to “algorithm is broken”
Even though we might disagree on some manners, I believe I was able to convince you that it is in fact a bug that is worth fixing, which makes me hopeful that it will get fixed

Summary of summary:

I think the outcome of this discussion is fruitful. There is a bug uncovered. After researching backup solutions I now know how duplicacy works. I learned something. I will be a happy duplicacy user from now on, cause I believe it to be superior to competitors. I just need to make double sure that there are never two backups running at the same time (with same snapshot id), using -comment or ps or what not in my automation script/cron. At least until the bug gets fixed.

I hope that the bug will get fixed. And with that I guess my last question is: should I open an issue in “GitHub issues”?

gmaxardis · 18 August 2024 07:45

should I open an issue in “GitHub issues”?

I have checked on GitHub, and it states to not open issues there, and instead write here on the forum. So I won’t open an issue there to not spam.

@saspus I can see you are the moderator, so could you change category of this topic from “Support” to “Bug” then? And also the topic itself to “Race condition between backups and prunes can lead to corrupted snapshot”

gchen · 18 August 2024 18:03

Policy 2 in the paper states:

A backup procedure cannot access any fossils. That is, it
must upload a chunk if it cannot find the chunk, even if
a corresponding fossil exists.

So the second backup will still upload a new chunk even if the same chunk has been turned into a fossil by the prune operation.

Moreover, Policy 3 guarantees that fossils collected won’t be deleted before the second backup is finished:

For each backup client, there is a new backup that was
not seen by the fossil collection step.

Of course, this assumes that the second backup is using a different snapshot id so it is counted a unique backup client.

This is not a bug. There is really no need to support multiple simultaneous backups with the same snapshot id. If that happens, then it is a configuration error.

gmaxardis · 19 August 2024 08:06

It is possible to corrupt backup
It is possible to avoid it by fixing code
It is not a misconfiguration - you might run 2+ backups with same ID because of automation tooling/etc

Ergo, this is a bug and it can be fixed

ADMECA · 20 August 2024 06:33

And taking the problem the other way around, isn’t there a way to avoid this kind of misconfiguration ?

gchen · 20 August 2024 14:31

I can’t think of any scenario where you are forced to use the same snapshot ID for multiple backups. Even if you need to run multiple backup clients to back up the same source directory at the same time, you can still assign a unique snapshot ID to each client.

Running multiple backups with the same snapshot ID at once is undefined behavior. Likewise, running multiple prune operations also leads to undefined behavior.

saspus · 20 August 2024 16:18

Yes, there may not be obvious valid scenario to do so, but it may happen by mistake: for example when running backup in cron. Unobvious one is when backup duration exceeds the backup cadence.

Duplicacy shall prevent user mistakes from causing corruption whenever possible, especially if that scenario is just user running plain innocent backup command.

The fix is very simple, and there is no good reason not to implement it.

Droolio · 20 August 2024 22:25

#!/bin/bash

PROC_NAME='DUPLICACY_BACKUP'

# abort, if already running
pgrep -f "$PROC_NAME" && exit 1

# otherwise, do backup
cd /path/repo
/path/duplicacy/duplicacy backup -comment $PROC_NAME

IS it a simple fix, though?

We’ve established - only local file and sftp backends use a .tmp file before uploading. So remains to be seen if this technique can be done with other backends and cloud APIs.

(In fact I’m curious now… does the above test script fail with local backends?)

saspus · 20 August 2024 22:54

Not really, you may have multiple duplicacy processes backing up to diffferenr destinations.

Also you have a race condition there. So it’s not a good solution.

Furthermore, users shall not be required to concoct workarounds to prevent data corruption when using published CLI, of a fully concurrent backup program.

It is, but it it wasn’t — it would not matter. It’s a bug that causes data loss using CLI commands and nothing else. It must be a top priority.

I don’t think this is related to backend implementation details. The problem seems to be that two instances end up backing up the same revision.

Duplicacy shall “claim” the next revision at start (by uploading a marker/placeholder, it can be done atomically. Or use guids as revision identifiers, not numbers. This will eliminate possibility of two backup jobs uploading the same version ID.

The same mechanism is already used elsewhere (fossilization) so it’s not really a bleeding edge tech.

Droolio · 20 August 2024 23:54

This was a simple example, and trivially easy to append a snapshot ID (and other stuff) to $PROC_NAME.

Quite frankly, I consider this the preferred method to avoid two backups running on the same ID.

Let’s say you ‘detect’ another backup, what you gonna do other than to abort the whole backup anyway? It’d be undesirable to allow both backups to complete - even if they save to different revision numbers - so why waste resources while they run 'til the end?

I’d want it to stop at the earliest opportunity and run one instance. -comment let’s you do exactly that.

So you’re talking about a lock file?

Duplicacy randomises .tmp files to allow concurrency - doing anything else forces you to run a second backup, because you can’t be sure any placeholder didn’t get left there by an aborted run.

Then what to do about other backends?

What to do about multiple prune runs?

There could be an argument made about using a local lock file for all such occurrences - say one for each ID, perhaps with a timeout, or even an internal ps check built-in. Fine. But I’m just not convinced changing the behaviour on the backend is necessary or as easily achievable as is claimed, without fucking things up.

saspus · 21 August 2024 01:04

Oops. Indeed. We don’t want that.

The lock-less solution would be to use guids for revision file names, thus letting both backups to proceed.

With guids it will “just work” — both backups will succeed. The “version number” will be meaningless — but nobody cares about it anyway — time of the snapshot is what’s important.

I don’t think revision names are used anywhere outside of revision management, so I don’t think anything will get screwed.

Droolio · 21 August 2024 11:13

You still wouldn’t want two backups running at the same time on the same ID.

GUIDs sound like it’d need some kinda database, which you’d have to read the contents of a file (or all files) for.

Revision numbers are dead simple to understand and behave (correctly so) as a sequence number, which you only have to list the directory contents to know how to do the next backup.

(Duplicacy also allows revision ranges as a flag - although I’d love if it wouldn’t error out when individual revisions in a specified range don’t exist )

Sure you could replace this number with a date in the format YYYYMMDDHHmmss but that doesn’t necessarily solve anything unless you add microseconds as well (ssssss), and even that doesn’t completely eliminate the problem. If I have to manually triage or diagnose issues on a storage, I’d much rather see simple version numbers than long ass timestamps or random IDs!

If it were me, I’d rather not take the risk. This so-called bug is incredibly rare to trigger and avoidable through current means, to warrant tinkering with all the different types of backends at this stage.

IMO this whole issue can be fixed by creating (locally) a classic .duplicacy\cache\<ID>\.lock.

This is FAR more preferable to having a placeholder on a remote backend, or some complicated manipulation with revision naming. A second backup is stopped instantly and the bug is prevented.