When to use check

Hi there,

I am a little bit confused about when to use the check command. Years ago I read somewhere (can’t find it anymore), that there is a best practise when to use the check command (in a web ui schedule). It was:

  • check
  • backup (all)
  • check
  • prune
  • check

But currently a check takes hours. So I wanted to ask when it needs to be run. Best would be, if it is not needed for backup and prune commands - so I could run it e.g. once a day. Or is this needed in the way I am using it at the moment?

There are many varying viewpoints regarding if/when to run a check because of so many different setups, needs and personal preferences that it might not be possible to have a “best practice” — not even a “3-2-1” backup regimen covers every situation.

In my opinion, if you feel you can trust your storage, it’s unnecessary to run multiple daily checks. Good cloud storage providers worry about their reputation and business, so they have the resources and expertise to provide reliable storage. You just have to make sure to have a backup routine (even infrequent backups onto mediocre storage is better than no backups at all).

Checking before and after every backup and prune is really more for peace of mind, so for that, by all means.

For my own backup procedures…

Files that absolutely cannot be lost are backed up to different places and onto different storage mediums (cloud + offsite NAS + DAS).

For DAS, I mostly use a mix of HDDs and SSDs, plus some gold-surfaced optical discs for good measure (I’m looking forward to those 360TB mini quartz optical discs with a 13+ billion year lifespan being readily available at a local store anytime now :grinning:).

  • btrfs on my NAS drives with at least annual scrubs to verify block checksums for bit rot detection.
  • duplicacy check at least once a year on backups stored on my offsite NAS.
  • S.M.A.R.T. monitoring to determine when to mothball a HDD/SSD.

Test restores before it’s too late…

For assurances that your backups are viable, it’s best to run a restore every now and then even though Duplicacy’s check process is very thorough.

Years ago I read a story about a company whose business revolved around their email system (day-to-day operations, client correspondence, etc.).

The company’s IT group installed a backup system to archive contents of the company’s Microsoft Exchange server. The IT staff diligently made sure that the backups ran as scheduled and checked that the backups were accessible (no backup errors).

One day the inevitable happened, the Exchange server crashed and took the email with it on the way down. After trying to repair the server without any success, it was decided to restore from the backups. Unfortunately, the backups weren’t any good (nobody had ever tried to perform a restore exercise). They had to resort to scraping together emails cached in the Microsoft Outlook PST files on each employee’s computer.

Long story short, a new IT director/manager fired the entire IT staff and started over.

2 Likes

I’m of opinion that backups shall be checked and verified periodically (not tested == broken), but not necessarily with the check command.

I don’t think running check after every operation is prudent.

Duplicacy has three flavors of check:

  1. Default, to check whether all referenced chunks are present
  2. With -chunks, to also validate chunk integrity
  3. With -files, one step further, to verify file integrity

There is a long and detailed discussion I had with @Droolio about merits of each, where i believe last two are pointless (storage must guarantee integrity) and he brought up good points about why they are not (storage may not be trusted, can break guarantees in some corner cases). Another angle there is they storage should support checksumming on demand and duplicacy shall take advantage of that and avoid downloading data when verifying chunks. This will be acceptable and useful. Have a look.

With duplicacy in current state of affairs I don’t use prune. The reasons — it leaves datastore in bad state if interrupted. Data is not lost — but checks will start failing on the ghost snapshots that were supposed to be deleted. I find it unacceptable, and until it’s fixed I’m not using prune.

Without prune the datastore is add-only. Each new revision simply adds data to the datastore without modifying the existing data. Hence, I don’t run check either: existing backup history cannot be corrupted by new backups.

Instead, I verify backup annually by restoring few random files.

The key here — it’s not the only backup, and therefore it does not have to be 100% reliable. If this backup has 0.1% chance of failure, and the other solution also has 0.1% chance of failure, the probability of both of them failing doing the same time is 0.0001%. Aka 0 for all intents and purposes. So, I don’t run duplicacy check.

When the prune is reenabled — I’ll add check, without any arguments — simply because it will take time for me to trust prune again.

Ultimately, it’s cost vs reward. Running check comes with a cost, more so when -chunks or -files flag is used, depending on the storage, and other variables.

On one end of the spectrum is backup to AWS without prune. I don’t check that. I trust AWS to maintain integrity of files and egress is very expensive.

Ok the other end is backup to a single local usb HDD. If I find myself in that situation I probably will be running check -files daily, deleting the log or checked chunks to force all of them to be rechecked daily. It would perform horribly but unless your storage system guarantees data integrity - that’s what you have to do.

2 Likes

I run check daily just to update pretty storage charts in the GUI :wink:

3 Likes

Thank you very much for your feedbacks. I appreciate that!

I now try to split the backup schedules from one into three schedules:

  • backup hourly (stays the same, but only do backups, so it is much more faster)
  • check daily in the morning including prune (no hdd involved, so no spinup - also no backups needs to wait and therefor no backup delay)
  • once a week check with -chunks -fossils -resurrect -persist to make sure backups are working (this was missing in my current configuration)

But at the first moment I thought about using check with -files instead of -chunks:

  • -chunks for me seemed to only verify, that all chunks are correct. This needs also to be combined with a “normal” check to verify, that all chunks exist. But it is not feeling to be a practical approach, because having all chunks is not equal to be able to restore all files for me.
  • -files seemed to be more practical for me, as it seemed like a simulated restore. So I thought it is better to stick to that command.

But after reading Check Questions - #11 by gchen I immediatly changed to -chunks as this is much faster, with -files this can run for days/weeks.
(In short, -files restores all files from all revisions. So if you have a 1gb file in every of your 100 revisions, it needs to download and check 100gb of data. Chunks instead verify that all uploaded chunks are not broken, so “only” downloads the same size as your storage is.)

One thing the Web UI is missing IMO, is an option to abort subsequent jobs in a schedule, if any of them fail.

For me, the ideal schedule would be to do check before any prune, mainly because you wanna prevent prune making the situation worse, if there’s missing chunks. The slightly extra time scanning and additional memory usage when encountering a few extra chunks is much less important.

Honestly, I wouldn’t use -fossils, -resurrect, or -persist in an automated schedule. These shouldn’t be necessary under normal circumstances. So long as you have some kind of reporting to notify of failed check jobs, you can run these flags manually if needed.

Yea, -files isn’t feasible right now in an automated setup. It really needs a -latest flag (which could also work nicely with copy, check and restore).

-fossils probably should always be there: Missing chunk, can't figure out why - #6 by gchen

All other flags are indeed counterproductive.

What I’d say about that is, probably… but maybe? :slight_smile:

From my understanding, under normal circumstances you shouldn’t have a ‘referenced’ fossil. i.e. a check without this flag should complete without error, unless something went wrong(?) with the pruning process.

Well, not necessarily wrong, because Duplicacy apparently resurrects fossils when necessary (when?), but I don’t fully understand the circumstances this would occur. Is it that prune was run with -ignore, or 7 days had passed, or there was a race condition perhaps?

So I personally wouldn’t wanna do this in a scheduled setup. Without feels safer, seems completely unnecessary for 99% of the time, and can always be run manually if necessary, when there are missing chunks.

I think the usecase here is concurrency one: prune and backup in parallel from same or different clients. Prune may decide to fossilize unused chunk, but before it does it, backup may have just generated the same chunk, and seeing that it’s there, not upload it. Then prune fossilizes the chunk and then backup completes and uploads snapshot. This snapshot references a chunk that is fossilized, and that’s OK – restore will work, because when restore fails to find a chunk it will look for a fossil and rename/recover it. Hence, since this is business as usual, check should also look at fossils when verifying snapshots. Shall check resurrect it while at it? IMO, no. Check shall be read-only thing, otherwise it risks messing up things even further.