Graceful Handling of Failed and Incomplete Snapshots

I have storage arrays that come online and go offline as needed. Sometimes weeks go by without them connecting to the server (they hold 20-24 spinning drives, over 100TB of files and pull a lot of power while running). I have scheduled (WebUI) backups to run every few days that include source data that live on those drives.

I noticed that if the schedule fires off a backup while the storage arrays are off, it results in a failed backup. Fair enough. However, when the next backup runs with the array connected, it scans things incredibly slowly (-hash?). Instead of the backup taking 20 minutes or so to scan contents and start uploading, it takes upwards of 12+ hours just to scan.

If there is already a way to prevent this I would love to know about it. Otherwise, it would be great if :d: would more gracefully / intelligently handle this situation (along with incomplete snapshots, which seem to also force a full scan to see what has and hasn’t been uploaded).

I would hate to have to manually run all these backups in order to avoid all that scan time.

…thread with similar issues:

2 Likes

Are you using the -hash option? If so, you’re forcing Duplicacy to re-read and hash all files from disk, rather than just looking at their metadata to determine what has changed.

If you are using -hash, I would suggest not using it for almost all backup runs. The initial backup necessarily uses -hash implicitly, but those reasons aren’t generally relevant to subsequent backups.

1 Like

No. Sorry if I made that confusing, but I was just implying that the failed backup might be triggering a -hash backup since it is behaving like one (and only after a failed backup, or if you cancel a backup in progress). I believe revisions are flagged somewhere in the logs as to whether they had a hash-level scan enabled so the next time it happens I’ll check.

I cannot recreate this behavior. What’s an affected workflow?

  1. Start backup
  2. Cancel it immediately
  3. Start it again
  4. Now the entire backup starts over from scratch?

If you have any check jobs running, you can see whether -hash was used for the backup in the tabular section of the log file. For example,

       snap | rev |                          | files |   bytes | chunks |   bytes |  uniq |   bytes |  new |     bytes |
 folderrrrr |   1 | @ 2019-01-05 06:36 -hash |    69 | 21,586K |      8 | 19,386K |     3 |      9K |    8 |   19,386K |
 folderrrrr |  39 | @ 2019-02-10 06:17       |    70 | 21,589K |     11 | 20,112K |     5 |    494K |    6 |      735K |

If you look at the backup log file, you’ll see all files when using the -hash option. Without it, you should only see new/changed files – as determined by file size and/or modification time.

Is it possible that some other application is updating all of the modification times on the files on your storage arrays between the last successful backup and the one that is taking a long time? This would cause the files to need to be re-hashed in order to detect any changes.

I think one possibility is that when the storage arrays are off, Duplicacy sees an empty directory so rather than a failed backup it uploads a new backup that contains no files. The next backup will have to start from fresh while not actually uploading any chunks (basically the equivalent of -hash).

5 Likes

I also though this could be the case so here’s a probably dumb? solution: can we instruct a repository to “don’t upload a new revision if 0 files are available”?

4 Likes

If what you both are suggesting is true, not being able to find the volume that holds the repository seems like something that should immediately stop a backup/snapshot creation from happening, similar to if you have an invalid option - it doesn’t say the backup “failed,” it just throws an error (“invalid options”) and stops without doing any damage. I actually use that all the time in the WebUI where I have special case commands like super aggressive pruning set up in a schedule but with 1 additional “bad” option so it won’t run unless I remove the bad option. :d: just skips that job and moves on to the next one in the schedule, no harm, no foul.

I will test at some point over the weekend to determine what situations = what behaviors in :d: (so far everything has been anecdotal observation from using it for a few months). I’m just in the middle of a bunch of projects so I can’t take that storage array offline at the moment.

1 Like

Maybe a more elegant solution would be to compare the number of files being backed up to the previous snapshot? This is essentially how a human might notice anything amiss when running a check -stats and looking at the number of files column.

Perhaps return with a FAIL exit code if num_files = 0 and a WARN exit code if num_prev - num_files > X (where X may be 1000 or a configurable number)? After all, the storage could go down mid-backup.

Obviously, if Duplicacy can’t access files it should hard fail, but this might not always be possible if symbolic or hard links are involved.

1 Like

Where would it upload files if destination is not acceptable? Or, rather, where would it record this “empty” backup? Is that kept in the snapshots folder in cache? If so, maybe temporary workaround would be to clean these files in cache before each run?

I think by ‘storage arrays’ he’s referring to the repository source disks here - not the destination storage.

Oh, I see… In that case it’s quite possible to get empty but valid backup…
What would help here is the pre-backup script which can fail entire execution if source is not available…