FATAL DOWNLOAD_CHUNK Chunk **** can't be found

Christoph · 6 March 2018 20:17

Why is duplicacy trying to download chunks when I’m simply running the backup command?

The problem is that I apparently lost some chunks due to failures of the pcloud app but I did not think that would prevent me from backing up at all. How does duplicacy know that chunks are missing when I just do a backup? Is the backup command internally running a check command?

In any case, I think it is very dangerous that backups fail to run because of one missing chunk. The idea is that you run duplicacy in the background and forget about it. The only reason I noticed this is because I’m still testing. If I hadn’t checked the logs, it would have been possible that I don’t have any backups of my data for weeks or even months, making the potential damage much bigger than the original missing chunk.

So, to get back to my actual failing backup: is it correct that the only way to resolve this issue and to get duplicacy to backup again is to delete all snapshots one by one until check -a doesn’t complain about missing chunks anymore (as described here)? If that is so, it means that a single missing chunk is causing a huge amount of damage and I wonder if this procedure could not be improved.

More concretely: let’s say the one missing chunk belongs to a single file. Is it a problem that this chunk is missing? If I want to restore that file, yes, itäs a huge problem. But let’s assume that file a very insignificant file that I will never need to restore. In that situation, the missing chunk is not really causing no damage. But duplicacy is nevertheless forcing me to destroy potentially all my snapshots in order to be able to resume backing up…

gchen · 7 March 2018 02:35

The backup operation may download some chunks because it needs to reconstruct the last snapshot for two reasons: to compare existing files with the files in the last snapshot so it knows which files can be skipped, and to construct a chunk cache from the chunks referenced by the last snapshot. These chunks may already exist in the local cache (under .duplicacy/cache), but Duplicacy always checks if these chunks exist in the storage first; it won’t just take the local copies if they don’t exist in the storage.

So if some snapshot chunks are missing then the backup operation can’t reconstruct the last snapshot and can’t really proceed. If a storage has a tendency to lose chunks at randoms, I think you should be more concerned about recovering existing backups and less about running new backups because the new backups will be broken too if you don’t solve the problem of missing chunks first.

My suggestion for now would be to stop using the pcloud drive as the storage until they fix the bug. I promise that the WebDAV backend should be available in a month.

Christoph · 7 March 2018 20:25

It turns out that those missing chunks are actually there (i.e.on the cloud storage) but for some reason (aka bug) they are not shown on the local drive. I have informed the pcloud support but they seem to have started to ignore me, so I don’t know if they will fix it. (I have summarized my experience with them in this review on trustpilot. Their biggest problem is communication and that makes the various technical problems they have with their app extra annoying. So I’m really looking forward to duplicacy supporting webDAV.

So, leaving aside the reason why I have missing chunks, I’d still like to discuss how duplicacy handles this situation. Thanks for explaining why duplicacy (sometimes?) checks whether certain (?) chunks are indeed available in the backend.

If a storage has a tendency to lose chunks at randoms, I think you should be more concerned about recovering existing backups and less about running new backups because the new backups will be broken too if you don’t solve the problem of missing chunks first.

I see your point but I disagree. I agreed that I have a big problem if my storage is losing chunks, yes. But duplicacy is making a bad situation even worse.

If I understand your argument correctly, you are saying that it makes no sense to continue backing up based on a snapshot with missing chunks because that means subsequent snapshots will also be broken. Well, not necessarily. Let’s distinguish two scenarios: Scenario A: none of the files to be backed up (i.e. new or changed files) needs the missing chunk. And Scenario B: one or more of the files to be backed up does need the missing chunk.

So let’s see what would happen in both of these scenarios if duplicacy would continue the backup despite the missing chunk:

Scenario A:
All new and changed files would be saved in the storage and therefore be entirely restorable in case of an emergency.

Scenario B:
All new and changed files would be saved in the storage but those files that need the the missing chunks would not be restorable.All other files would, however be entirely restorable in case of an emergency.

Both Scenarios are much better than the current situation in which none of the new or changed files are restorable.

Duplicacy could even be further improved in Scenario B to make even those files with the missing chunk restorable: my suggestion would be that it simply re-uploads the missing chunk if one of the new or changed files needs it.

I am aware that this latter suggestion may not be quite as easy to implement as it sounds because I am ignoring the fact that a chunk may include multiple smaller files or file parts so that it may be impossible to recreate the entire missing chunk. But even so: Both scenarios above are still way better than the current situation.

Jeffaco · 7 March 2018 21:46

Hi Christoph,

I think I disagree with you. You said:

“it would have been possible that I don’t have any backups of my data for weeks or even months” …

The point of backing up data to the cloud is that it’s available to me in case of disaster. If I can’t trust my backup (either due to software, or provider problems, or whatever), that’s a showstopper and, for me, would cause me to choose either different software or another provider (whatever was needed). Backups to the cloud (at least for me) are for disaster recovery. If I have a disaster, the backups BETTER be good, or I’m really in trouble!

Duplicacy’s job isn’t to “try and mitigate” data loss. Duplicacy’s job, in this case, is to be “in your face”.

I’m going to need to write some scripting around the CLI to automate my backups, and I’ll certainly be checking for failure codes or unexpected errors and send E-Mail and raise red flags.

As an aside: I’m REALLY serious about data loss. I back up to a hard drive and rotate that to a safe deposit box once/week. And in the meantime, I back up to the cloud. In case of disaster, I’d likely take my most recent backup disk, restore that, then restore everything that was changed in the past week to get “up to date”. Oh, and I tend to not trust any ONE cloud provider, so I actually back up my data to two completely different cloud providers.

When you encountered this problem, I presume it was through the CLI? Did the CLI return a non-zero exit code (that you ignored)? Or did the CLI return a success code even with this error? If it returned an error code, your scripting should have checked for that. But if it returned a success code, then that’s a bug that should be fixed.

towerbr · 8 March 2018 00:46

Very interesting discussion, but there is one point I did not understand:

So if some snapshot chunks are missing then the backup operation can’t reconstruct the last snapshot
and can’t really proceed.

In that case, wouldn’t it be better for Duplicacy to create a new snapshot? Am i missing something?

gchen · 8 March 2018 03:01

If you really want to create a new backup when Duplicacy fails to load the last snapshot, you can just modify the repository id to a new one that don’t exist in the storage, and the new backup will be an initial backup. This would cover both scenarios A and B, and even be able to re-generate missing chunks. However, I don’t think it should start an initial backup whenever there is an error to load the last snapshot, because such an error may be temporary (such as a network error) or easily fixable (a wrong password for instance).

towerbr · 8 March 2018 19:55

I don’t mean a new backup, but a new snapshot of the same backup (repository/storage).

Christoph · 9 March 2018 11:19

@Jeffaco I’m not sure why you disagree with me. In any case, I don’t think I disagree with you to the extent that one should not rely on one backup alone and that a missing chunk is a problem that the user needs to look into and fix. But I don’t see that as a contradiction to improving the way duplicacy handles missing chunks.

To use road safety as an analogy: when a new technology becomes available (such as seat belts, headrests, or airbags) one could say: but we don’t really need those. Drivers just need to observe the traffic rules, drive carefully and have their vehicle inspected at least once a year and everyone will be fine. And the grain of truth in that argument is that we need to consider the costs of those new technologies too. The fact that they exist does not mean that they should be implemented or that they should be implemented in all vehicles. But if a cost-benefit analysis makes it look like a worthwhile investment, then it would be foolish not to implement them.

I admit that I did not include such a cost benefit analysis in my suggestion. Instead I focused on the benefits. But if Gilbert (or anyone else, for that matter) argues that my suggestion is very difficult to implement (or in other ways costly), those arguments will surely have to be considered. But my hunch is that it is not difficult to implement.

If you really want to create a new backup when Duplicacy fails to load the last snapshot, you can just modify the repository id to a new one that don’t exist in the storage, and the new backup will be an initial backup.

Do you mean that if you were to implement something along the lines of my suggestion, a better way of doing it would be to have duplicacy automatically create a new snapshot id and do an initial backup?

If so, I’m not sure why this would be better. I see at least three potential problems:

It would be more complicated to implement than what I suggested
In the case of large repositories, creating a new initial snapshot would take a very long time (as I reported elsewhere, it took duplicacy about 24 hours of going through skipped chunks for a repository of 1-1.5 TB)
As you say yourself: the error may be temporary so that creating a new initial snapshot would be overkill and would just create a confusing situation with potentially dozens of snapshot IDs for the same repository.

And, as a minor comment:

This would cover both scenarios A and B, and even be able to re-generate missing chunks.

Yes, it would perform as well as my suggestion in both scenarios. But it will, of course, meet the same limitations in re-generating missing chunks as my solution, i.e. if the the data required for a missing chunk is no longer available in the repository, the missing chunks can obviously not be recreated.

So what is the problem with my original suggestion?

Jeffaco · 9 March 2018 23:27

My thoughts on this come to an analogy of a database:

If a database comes across block corruption, should it just try to silently “repair it” and move on?

In my opinion, absolutely not. Yes, if it could silently “repair it” and move on, that is more customer friendly. But much more pressing: WHY were the database blocks corrupted? To me, it’s much more important to figure that out and understand it, and address that. As a SEPARATE STEP, after analysis is done, you might want to try and “repair” the database if you can. But much more important: WHY did this occur? After all, next time, it may not be such an easy “fix”.

Data stored in a database is critically important. Data stored in my backups (that I may need it for disaster recovery) is critically important as well. Without backups, I may be unable to recover from a disaster.

This is similar to Duplicacy, in my opinion. I want to rely on my backups. I want to know, with as much certainty as possible, that my backups are “good”. And if my backup software, somehow, realizes that the backups are NOT good, then I want it to be “in my face”, knowing that something serious happened. The only way a CLI could do that: fail (and return a failure exit code). I may be running with -stat, or with logging enabled, so if the CLI just mentioned “hey, there’s a problem, but I’m fixing it”, I’d never know anything was wrong. And this could very easily mask a much more serious problem that needs to be investigated.

Just my beliefs in this matter. You’re free to disagree. But I doubt you’re going to change my mind (not that you have to, of course).

gchen · 10 March 2018 03:28

I tend to agree more with Jeffaco. If the underlying storage has a tendency to lose chunks, then it is not worth the effort to even store any backups on it.

Besides, there is a simple workaround for that (by changing to a different repository id). If you want Duplicacy to continue to create a new snapshot when the last snapshot can’t be loaded, then it is basically doing the same thing as the workaround. But having this functionality built-in is dangerous, because as I said there could be just a temporary network error in which case you don’t want to start an initial backup which is usually much more slower.

Christoph · 10 March 2018 09:53

If the underlying storage has a tendency to lose chunks, then it is not worth the effort to even store any backups on it.

You are introducing a new condition into the discussion here: the storage has a tendency, i.e. it will lose chunks repeatedly. Of course you should be changing the storage (or have them fix the problem). But I’m talking about a more general situation in which there is a missing chunk, for whatever reason. And I have argued that making duplicacy stop backing up in that situation does not solve any problem for the user, it only creates new problems. I have so far not seen a counter-argument to that claim.

Your workaround does not address the situation I’m talking about since it requires the user to take action. I am talking about what happens (or should happen) in the time between the missing chunk fatal error and the user taking action (such as creating a new initial backup under a new ID). There is no reason why duplicacy should stop uploading chunks during that time.

towerbr · 11 March 2018 02:02

There is no reason why duplicacy should stop uploading chunks during that time.

I also see the same way. In other words, “sh… happens”, and regardless of the reason, it would be better if Duplicacy continued, even if it had to generate a new snapshot (from the same repository, without starting from scratch), abandoning the snapshot with problems.

gchen · 11 March 2018 02:33

So our only disagreement is on whether or not the user action is required when there is a missing chunk. I think it should, otherwise the error may go unnoticed until the slow initial backup finished. If you really want Duplicacy to continue uploading chunks, you can wrap that in a script which will automatically modify the repository and start a new backup once a missing chunk is detected.

Nelvin · 11 March 2018 07:52

I’m with Gilbert here.

As a developer (games) I’ve seen this many times, as soon as you add an “ignore an continue” to your assertions, because in general it can be a huge improvement for your turnaround times when testing/debugging, your users/testers will very soon get used to this and press the button without much thought. The result is always the same, many new problems/bugs arise just because corrupted state/data was ignored. Like … assert->ignore when the inventory couldn’t be loaded and later on you get bugs like “game crashes when opening the inventory screen”.

I think Duplicacy should do everything it can, to restore as much as possible from a partially corrupted storage, but it also should not continue to backup as soon as a problem appears.
For sure, you want your car to handle as good as possible if you’re rushing along a highway at 200km/h and your breaks fall apart, but IMO, if you want to start driving and they’re already broken, it’s a much better idea to not even be able to move your car at all.

Christoph · 11 March 2018 11:24

@gchen

If you really want Duplicacy to continue uploading chunks, you can wrap that in a script which will automatically modify the repository and start a new backup once a missing chunk is detected.

No, I don’t want duplicacy to start a new backup aka snapshot id (for the reasons explained in an earlier post above). I want it to continue with the current one. And that is currently not possible with scripts.

So, no, it seems the question of user action is not our only disagreement.

@Nelvin

I understand your experience from programming games but I think this is a different situation. It’s not about letting the user click away annoying but important error messages. The user is not involved at all here. It’s about what duplicacy should do when it comes across a missing chunk while backing up. We agree that it should do the obvious: throw an error message. And we also agree that the user will need to eventually take action on that error message. The question is, what should duplicacy do in the meantime?

I don’t think the analogy with driving a car works so well for what duplicacy is doing (protection against loss). So let’s take a home alarm system instead. What duplicacy (the home alarm system) is currently doing is this: it has discovered that one of the sensors that detects a burglary is broken and it flashes a red light on the panel on the wall and shuts itself down.

What I want it to do is to flash the light (of course!) but prevent it from shutting down because even though the malfunctioning sensor is a serious problem, I still want my house to be protected my the remaining sensors until I come home.

Jeffaco · 12 March 2018 16:00

Duplicacy is a CLI-based program. As such, it can print messages, and it can return a success or failure code.

Do you propose that it return a failure code in this case, but still back up remaining chunks? If so, then looking at the log file (which may be very long with either files backed up or stats), how would you know why the failure code was returned? Should Duplicacy “buffer” all missing chunk messages and output them at the end of the backup? And then, if ^C is hit, those are lost forever.

My alarm system fits my analogy. It’s totally “in my face” if a sensor is broken. Indeed, I can’t even arm it unless I disable the broken sensor from the keypad. So perhaps a new qualifier, “-ignore-missing-chunks” is called for to satisfy your need. Note that I’d NEVER want to do this myself. As stated before, I think it’s inherently dangerous. You’re doing a backup but saying “hey, if the backup is bad, I’m cool with that”. But that’s all I can think of to satisfy your desire here without degrading Duplicacy for everyone else.

I guess I’m most curious how you think it should behave in case missing chunks come up:

Success error code? Or failure? If success, how would anyone ever know a problem came up?
What to do with the missing chunk messages, particularly as they scroll by and are otherwise lost in the noise?