Recommended Way to Test Backups

I have been using duplicacy for quite some time now. I am running a prune and then check daily. I was wondering if anyone has any recommendations on how to automate testing a restore. My understanding is that part of a solid backup plan is to perform regular tests of restoring files. So what do people do for this? Do you just pick something at random to restore and restore it somewhere and make sure it’s fine? Is that sufficient? To me that tells me that that one single file/version can be restored, but does it tell me anything else? Thoughts on this topic would be great. I’d love to automate something here so I don’t have to manually do this every so often.

Check -chunks or check -files is almost the same as restore, and has a benefit of downloading each chunk only once.

However, duplicacy implemented CAS on top of existing checksummed storage api and hence in my opinion it’s a waste of time to run any check but the basic one: wanting to validate chunk content means you don’t trust your storage. And the solution here is to change storage provider, not add more checks.

The check with no parameters validates metadata chunks and ensures that all chunks necessary to restore files for the specific or all revisions are present in the storage. (It’s responsibility of the storage to guarantee integrity).

Therefor if you do run prune — it’s useful to run that basic check to make sure the backup is still valid (protecting against possible prune bugs that could delete too much or otherwise leave the datastore in inconsistent state, as it does at least under one known scenario: interrupted prune)

Personally — I neither run check nor prune. I’ve tested restore of one file once — just to ensure I have all credentials and everything else required to restore from scratch. But that’s it. I also don’t use flaky storage, and rely on the consistency guarantees that the storage provider offers.

I trust my storage provider (Backblaze B2). Doing a check -chunks or check -files will cause my bill to go up because of downloads, so I’d rather not do that for the whole repo. It sounds like what I am doing now is fine.

Two things:

  1. I would not trust B2. They had a bug quite recently where their api was returning bad data. Fortunately data in storage was OK but at this stage of maturity these types of mishaps are not acceptable
  2. You can egress from b2 for free via cloudflare. Duplicacy supports this usecase (you can provide alternate download url)

Good to know about the bug. I will need to do some research into an alternative then. I used Wasabi previously, but I didn’t like their deletion policy.

I too dislike wasabi. It never worked for me properly (us-east endpoint intermittently and us-west broken almost all the time, for the whole six month I was trying to make it work. Plus their horrible deceptive pricing annoys me.

Hot storage that forces minimum retention on you, is more expensive, and promises free egress when it isnt aka blantantlu lies in the customer face (it’s free only up to amount of data stored. You can essentially download it for free once. It’s horrible. Carbonite was horrible and deceptive and wasabi is too. What else could we expect in the hindsight. /rant)

Once duplicacy implements this Feature request: Store metadata chunks, or any chunks containing them in a different folder you will be able to use archival grade at AWS, at a very attractive cost structure appropriate for backup workflows. Using hot storage for backup, albeit from a “discount” providers is wasteful anyway, so I’m looking forward that: it’s been accepted and is now “planned”

2 Likes

What are you currently using?

I don’t use B2 but I gather the issue there was simply a data-retrieval bug - the data itself wasn’t affected and no data was actually lost. IMO it would be silly to ditch one of the most well regarded providers geared towards resiliency, when the alternatives put you in exactly the same boat or worse…

It’s one thing to say “don’t use a storage backend if you don’t trust it” and then another to “assume the backend takes care of data integrity; no need to test it”.

Well how do you know what to trust?

The point is, you should never trust ANY storage - cloud or local, even if all the lights are green - and that’s exactly why you should test it*.

You’re not just testing the integrity of the bits resting on whatever media, you’re testing the underlying encoding, error-correcting logic, APIs, code logic of the client application (eminently more important with a tool that repacks your data into chunks). You’re not even just testing the data itself, you should be testing a disaster recover scenario (at least once) - i.e. do you have the encryption key for your backups stashed on your mobile? The mobile that got destroyed in the fire that took all your data. What about MFA codes on your phone? You do have a backup of that… oh it’s ‘locked up’ in backup storage?

Now if you assume a particular backup isn’t 100% guaranteed (as you should), you can mitigate against any one storage failing by following the 3-2-1 strategy at a minimum. And test against them all.

*How much you should test is totally up to you.

Regular check jobs may be enough for an early warning, but unless you’ve done an actual restore (or at minimum a check -chunks), you simply don’t know if you can trust it, other than gather anecdotes from the internet.

Well, since you do 3-2-1, and the process of doing a Duplicacy copy effectively tests the source copy, the only thing left do to is test the (cloud?) destination from time to time. (And test all copies anyway with restores. Again; full, partial, how regularly, is up to you - though never having tested a full restore, you may perhaps not fully appreciate the process, or how important your data is, until you’ve done it once.)

Back in the day it used to be common to rotate out backup media - switch the source and destinations around.

Harder to replicate with cloud, but my process would be to - once a year - do a Duplicacy copy from cloud to a new local copy of only the most recent snapshot revisions (of course copying all revisions is an option too). Thoroughly test that storage by doing a full restore, again to a temporary local space. Compare that data (with hashes or a tool such as Beyond Compare) with your live data and maybe with that of your other, local backup copy. Delete or “retire” your old local backup copy and backup to your new local instead. One way to do this, if you didn’t copy all revisions, would be to copy your older revisions from your old local copy into your new - the process of which tests the integrity of those missing chunks - and then you can finally delete the older copy. Precisely how all this is implemented can be tweaked, but the idea is to “read” and recover data from all of your backup copies.

Yes, this would require a lot of spare disk space, but having plenty of space would be recommended anyway in a disaster recovery scenario, and ideally in a proper test, you wouldn’t have access to your original infra, including your mobile phone. You could always chop up the job into sections; one snapshot ID at a time, restore one subfolder at a time etc…

To automate all this (which is certainly a worthwhile goal), I haven’t a clue. Unfortunately, restore in Duplicacy GUI can’t be scheduled, only check can, and I dunno if you’d wanna do that anyway without ensuring plenty of disk space. Duplicacy only recently allowed restores without the browser window having to be left open.

With the CLI I’m sure it could be done, though I’m not sure if the effort of automating that once a year is all that worthwhile, but you could easily document a process and run through a proven check-list to make things easier.

Automating say, monthly restores (with CLI), on your local backups may also not be worth it compared to a scheduled check -chunks or -files, although unfortunately doing -files in the GUI is completely infeasible due to there not being -latest (revision) option, which Duplicacy sorely needs - for both copy and check jobs.

Anyway, Test Test Test.

Sorry for the long ramble, though I honestly think it’s important to think long about backups if you care about your data. :wink:

2 Likes

The point is that it was allowed to happen. Meaning, there is no proper QA processes in place, and if could happen in one place in the pipeline it can happen in another. Judging by the design of their desktop software products – there is a bunch of interns doing all the work. It’s amateurish and laughable. I will never trust them to keep my data long term.

Regarded by who? From my perspective, back blaze is in the same vicinity as Hubuc, pcloud and idrive.
Backblaze is cheap, and fast if you are near the datacenter, and if you have a web application with low uptime requirements – you can utilize them for storage. Hot storage. For stuff you don’t mind losing.
But long term backups? No way.

I will try my best to elaborate on both points below.

Indeed. But in real world it’s a continuous spectrum.

And then, even if you do full check, download entire archive you only know that it was possible to restore one minute ago. You don’t know if it is still possible to restore again. This is very important. This qualitatively distinguishes “check often correct as needed” and “rely on integrity by design” approaches.

For example, imagine you wrote a file to a location. Let it be a magnetic media for now.

  • Should you read it immediately back to verify it’s there?
  • Should you verify it one hour?
  • In a month?
  • in a year?
  • Why not in five minutes?

What happens if file is good in an day but bad in a year? Does it mean checking every hour is a way to go? Or ditching this horrible storage appliance would be more prudent? Shall you built in magnetic media correction code to your backup tool, text processor, mail client, etc?

The answer is neither. Once you start compensating for deficiency of your storage you must stop and re-assess, as you are on a wrong path.

For example, if that storage was an HDD – it’s a realistic usecase (elaborating too much here for the other readers who may not be familiar): disks rot, and file written today may not be readable in a year. What the industry did? They did not force everyone to start writing software that assumes that files written can be corrupted; instead, they have addressed the problem definitively: Created a filesystem that uses redundant media and periodic scans to continuously find and rectify rotten sectors, thus guaranteeing the storage consistency. Now you data cannot be corrupted by design, mathematically. That appliance is simply incapable of returning you bad data. Therefore verifying data integrity of files read from that appliance would be a complete waste of time: appliance already does it on its own, as part of periodic scrub.

Every reputable cloud storage provider does it. Or shall do it. (CrashPlan did not - -but it’s the whole other story. And they are far from being a reputable in any way either way). Backblaze does it. You were fine with b2 api malfunctioning as long as the data was intact – then you should not be verifying the data stored at B2. They do scrubs and data on their pods is guaranteed to be valid.

A side note, when I say guarantee, I mean probability of failure is negligible. There is always theoretical possibility of failure, even if cryptographic hashes were used for parity, but this is not practical enough to worry about.

How to weed out reputable providers? Just look at what banks and other large commercial companies use. You’ll find it’s Azure, Amazon, and Google. That’s it. Nobody uses pcloud, b2, hubic, and other toys. And therefore, neither should you, if data is important to you. Which is the case for the backup data.

This I agree with. Hence, check -chunks is a good protection: it helps verify all that, once per chunk. This is as far as it needs to go.

This is another point I agree with: restoring to a brand new machine in a brand new location, having storage credentials, keys, accounts, etc. But purpose of this is to ensure all accounting and credential management is in place, and has nothing do to with storage resilience; it is entirely separate, and very important task to perform. Unless you have tried to restore from a backup – that backup should not be relied on.

Two notes here. Nothing is 100% reliable and nothing can be made 100% reliable, but mainstream cloud providers and some backup tools, including duplicacy, get pretty close; close enough that the difference is not worth worrying about, and definitely not worth the extra expense re-doing the deep backup verification.

That was because media was deficient. Or for other risk mitigating reasons, but today cloud providers do whatever it is needed to guarantee data consistency as part of providing you the service. You pay money – they give you reliable storage. If this involves rotating media, running scrubs, restoring from backups – it’s not your concern. You get guaranteed faultless storage.

3-2-1 is a massive overkill. For most home users one replicated copy (i.e. contents of you DropBox or iCloud Drive) and offsite backup (e.g. duplicacy to AWS) is more than enough. Of course, if you run a business with some specific requirements and data retention policies – it’s a separate story.

I won’t comment on the the net paragraph desribing the mitigation – while it’s a prudent approach, I disagree with the goal, I believe there is nothing to mitigate and it’s a waste of time.

But I do agree with this:

To summarize, this is what I think:

  • Restore from scratch must be tested on a few files (to verify access to the storage and keys)
  • Verification of duplicacy’s CAS likely should be done (check) (to protect against duplicacy bugs)
  • Verification of data integrity of new chunks probably should not be done (check -chunks). It would be testing duplicacy integrity and if you don’t trust the tool so much – switch to another. But the overhead is minimal, and for extreme cases of anxiety can be used.
  • Verification of the full and part of the archive contents periodically should not be done, because probably of failure does not justify expense of lifting a finger doing it (or paying for egress, or paying for electricity). Instead – use another cloud provider if the data is uber important, with another backup tool. (All eggs in the same basket thing)

It’s useful. It’s just going all-in optimizing reliability of just one thing has diminishing return wrt to the amount of efforts. Picking right provider and backup problem is sufficient, and if not – second destination should be used. But downloading the whole archive monthly is a waste of time, or indicative of poor choice of providers, to summarize what I tried to express here.

1 Like

“Allowed to happen” heh - as if any entity can possibly guarantee mistakes can’t happen. You nor I can possibly know there wasn’t a “proper QA” process in place - that’s pure supposition, and we don’t have all the facts to make such an arbitrary claim.

Mistakes can happen with any company or system. Your so-called reputable companies - Azure, Amazon, Google, let’s add Cloudflare in there - have outtages every other week, do you think they have proper QA processes there, and that it drills all the way down to how your bits are kept?

Absolutely, yes.

The option should be there, as with any backup software, but I’d also settle for remote checksumming.

The reason is simple in Duplicacy’s case; we know for a fact that, due to its design, on some backends it can lose chunks entirely and leave truncated chunks, and that subsequent backups pretend to run fine. To borrow your term, Duplicacy “allows” this. Until a check detects instances of missing (but not truncated chunks), or a check -chunks detects missing or truncated chunks. If you don’t do the latter, nor a restore, if you don’t have ECC RAM (and even if you do because bugs), you’re taking a huge risk.

We discussed all this to death before, but IMO Duplicacy lacks early failure detection - check -chunks is an expensive operation on cloud storage, when remote checksumming is doable though sadly not implemented nor on the roadmap.

So yes, immediately.

Yes, why not?

A least partial data - randomised on a schedule similar to how disk scrubbing works, would be nice. SnapRAID, for example, can be told to scrub X% of data each time, so that 100% of it in Y time is fully checked. Why not?

At minimum, IMO.

No. Because the chances of two of 3 copies failing in that space of time is so ludicrously small as to be irrelevant. Besides the fact the off-site copy is usually (as in my and many people’s case) only done once a day. The impact of losing incremental amounts of data is such a short time as also extremely minimal.

Expand the example to 1 hour, 1 day etc. - similar result. The user can judge full well for themselves based on the regularity of their own backups. The option is there.

Yes, why not? Such magnetic media has basic checksumming for detection. Duplicacy can already do ECC - it’s another layer, that may already sit on top of a parity-based RAID system if you should choose. Optical discs have ECC, many people use extra layers on top of that, such as par2. Why not?

Agreed, it’s good, but not perfect.

It doesn’t simulate the recovery a full file - a file that may have been modified over a great many revisions, many incremental chunks uploaded and even pruned - you have to trust the client logic, that there isn’t a rare edge case bug.

Also with check -chunks, it checks chunks once only and never again - unless you manually delete a cache file, which makes the whole procedure rather crude when you consider some chunks get tested more regularly than others instead of at a constant pace. Lots of wasted bandwidth.

Would be more useful to combine -chunks with -files -latest at different intervals.

I completely disagree.

Nobody should be left to judge the degree to how reliable a service or tool is or isn’t on reputation alone, they simply don’t have enough information, even if it’s their own system it shouldn’t be relied upon. In some cases, it becomes obvious (CrashPlan, anything with WebDAV), but you can’t assume Google is infallible just because it’s a big company. How ‘close’ to perfect is irrelevant - it is fallible, they all are.

Point is, don’t trust any single cog 100% - sure listen to word of mouth, but unless you yourself run it through its paces (verify) you’ll never know, for example, that Crashplan regularly loses entire databases particularly when it gets too big, or that CP restores can become impossibly slow as to take millennia (I did this very test myself - as did many others). Or that B2 once had a minor API issue. How does any of this get discovered if people don’t test?

In one breathe you say nothing can be made 100% reliable and in another assume you’re getting guaranteed storage because you pays good moneys. Hilarious. Do you get an insurance payout with that $5/TB/month?

I’ll just leave this here.

To summarise, your approach is: Trust, don’t verify. My approach: Don’t trust, verify. I leave it to the reader to decide which is the safest approach.

Precisely. You shouldn’t trust the tool, you can never trust it. Hence the 3-2-1 strategy; it isn’t just about the number of copies (3), but also about the methods (2), and physical separation (1) in case of natural or human disaster.

The ‘2’ normally represents media but in the case of Duplicacy, where data is repacked, its integrity compared to raw bytes should be carefully evaluated. The ‘2’ can also mean two methods, or two tools (Veeam Agent for me), which mitigates against bugs in any single one.

No finger would have to be lifted if Duplicacy supported remote checksumming, but egress is usually free - even with B2, if set up right - and electricity cost is almost negligible if you happen to be one of the many users who runs 24/7, runs a server, or owns a cheap VPS.

The rewards of verification outweigh the extra cost if, one day, you actually have to rely on your backups. A once-a-year test is a no-brainer, but even a one-off test with a new backup strategy would be wise.

But we can, and we do, and we have!

Facts: Whatever release and QA process they have in place allowed for the code to be deployed that resulted in bad data being returned to the customer.

This means their release and QA process is inadequate.

Yes, of course. It’s how the company handles the mistakes what matters.

Yes, and yes. Outage is much, much preferable to the bad data retuned to the customer. There should have bean an outage at B2 when their internal monitoring systems detected discrepancy. Ah, they did not have monitoring system in place, interesting, isn’t it?

Ok, so we are in agreement here. duplicacy backup && duplicacy check -chunks is the good middle ground here. Verifies that the data is correctly uploaded, is downloadable and still valid.

I remember that chat, yes, and I agree, but this is already area where you have to trust storage: trust for them to compute checksum on demand, and not just return pre-computed ones, the one that you have provided along with the data at the time of upload, you know, to save compute resources. That would defeat the purpose. So, you have to trust them to do the right thing here. But then you can also trust them to also do the checksum verification internally anyway (because they have to maintain data viability), and therefore you don’t have to. (This last sentence is the logic bridge that we seem to disagree on)

Not, because disk subsystem already does it, including on SnapRAID, and every single one of other storage appliances! Why would you want to keep re-doing that work in the application?

Ah, so we are indeed talking here about bit rot? This is addressed by periodic scrub by the provider. So the time duration variable due to this is out of the picture. What leaves us with a single check -chunks being sufficient when done once per chunk.

Because separation of church and state, division of labour, etc. Otherwise soon duplicacy will become it’s own OS, with disk drivers and network stack, because where would you draw a line? What if RAM is bad? Let’s duplicate every buffer allocation and compare contents before every fetch, etc, merge zfs source into Duplicacy and give it access to the raw storage, because what if filesystem cannot be trusted. etc.

And I still maintain that this should not have been implemented, and should be backed out. It’s an ill-conceived half-measure. it adds unnecessary code, increases maintenance costs, and decreases stability as a result. It’s not a job of a user program to workaround storage deficiency. It only slightly benefits few users who for some reason still store backups on a single HDD, but it’s just delaying the innevitable and is harmful. Let users loose data and migrate to the proper storage, as opposed to masking data loss and prolonging the agony.

I have explained in the side note what I mean by 100% reliability and guarantees.

No. My approach is to balance trust and verification.

  • Duplicacy internal consistency if destructive operations are used: definitely verify and often (duplicacy check with no other flags)
  • AWS data integrity: definitely don’t. Through the magic of CAS data integrity translates to duplicacy datastore integrity. Verification is therefore 100% waste of time (and money), for the reasons described above.
1 Like

All of this is interesting information to consume, but it’s ultimately a LOT for the average person to deal with and even more so, a lot of at home sys-admining that I’d like to avoid. I have struggled in the past with trying to weigh the pros and cons of a storage where it repacks the data vs just storing the files as they are (client side encrypted of course). I’m going to keep thinking on how to balance complexity of backups and ease of use.

1 Like

This is a very important point. One of the important tasks of the user facing backup software, that is targeted to the wide audiences, should be ease of use and this can only be accomplished by hiding all the complexity inside. Backup tool developer can make reasonable choices that should be used as defaults — because you are right, users cannot be expected to become experts on data management. They need a solution with one button called “safeguard my data”. If power users want to tinker with internal — sure. Add another button called Advanced Configuration but it shall not be required for 99% of users.

The duplicacy cli is a great tool that can be incorporated into a successful backup strategy.

The web ui in my opinion should be more than just a thin shell and a scheduler. It shall become that driving force for the set it and forget it approach to backup.

Yes, this will come with a lot of hidden complexity and ugly hard to support design decisions but thats the place where it needs to be done. For example, webUI needs to know about OS specific locations that contain transient data to exclude them. It also shall know about important localitiona to include them. On macOS duplicacy web already does it by enabling support for exclusions by extended attribute that is one of the signals that time machine used to exclude file. It’s not the only signal, so there is room for improvement.

Speaking of time machine — this is an example of good design. Step one - pick a target. Step two — … there is no step two. It all works automagically. Right stuff is backed up, right stuff is excluded.

Another good example is Arq — default configuration is good enough on each supported OS for most users.

Backblaze personal also took that approach but botched implementation and maintenance negating all benefits. It also has other issues I can talk about for hours. But it were correct intentions.

So in summary you are absolutely right. Duplicacy Web as a product must “just work” with no configuration for most users on most OSes. This require a lot of efforts from the developer — but otherwise it’s pointless. Who is the target audience? People that know all the nitty gritty details about their OS to configure exclusions, are aware of various flavors of storage to pick one correctly and configure validation appropriate for the storage but the life of it can’t figure out how to configure scheduler on their OS, to need a custom scheduler?

Because that’s what duplicacy web is right now: a scheduler. And instead, it shall be a grandma usable end user product. Even having a configuration wizard would be a massive step forward — ask user about everything pertinent and configure exclusions and schedules. And then do the next leap of not requiring user input and providing the highly customized and polished Default experience (not just the set of fixed settings!) and Advanced configuration for those who seek it.

2 Likes

Rather than pick apiece every comment I’ll just address the most important one (to this topic) and spit out some thoughts near the end…

Without intimate knowledge of the provider, it has to be assumed nothing is “adequate” enough - including the QA process. Even your own server. In security, adopting ‘zero trust’ is a similar thought process around implementing mitigations, because it forces you accept all failure modes occur.

Verification, which is what this thread is about, is absolutely critical to any backup plan. By not testing, you’re making a potentially catastrophic mistake with your backups.

Likewise with 3-2-1, the bare minimum. This ain’t even about the tech; disasters are a fact of life.

Doing neither is foolhardy.

I need to trust more than bits on a platter.

Personally, I’m much more concerned about Duplicacy doing its own job properly - not leaving its db in a corrupt state due to truncated chunks during transfer. I can mitigate against bit-rot, bad sectors or HDD failure locally, what I can’t mitigate against is my cloud backups claiming all’s fine 'n dandy.

Our only real option is egress. Duplicacy, quite frankly, needs to do better than that.

So here’s my wishlist…

check -chunks: the first stage of each run does an expensive ListAllFiles() - redundant if you just want to test recently-uploaded chunks. It could log when a chunk was last checked and allow re-checks under a threshold, allowing the cache to be cyclical without having to manually delete once in a while.

Built-in ECC: Very happy with the implementation. IMO it should be the job of any backup software - particularly when it repacks your data - to implement as much data-integrity protection as possible. What’s missing is rudimentary remote checksumming…

Server-side checksumming, post-upload: covers communication errors immediately after transfer. Avoids expensive egress. Pre-computed checksums is not a critical factor here since the most common failure modes (transfers) are tested against. Once transfer is complete, I leave it up to the storage and the occasional verify to tell me otherwise.

Server-side checksumming, stored in metadata. Duplicacy already stores file and (decoded) chunk checksums - why not add a structure for (encoded) chunks checksums? Most backends support remote hashing. This can save a LOT of bandwidth, important as backups increasingly move to the cloud. You rightly point out pre-computed sums might be an issue (and it’d be nice to know in each case) but would regardless be helpful for local and sftp-based storages where that doesn’t happen, and it’d act as an extra safeguard in case the backend has another ‘bug’ nobody is yet privy to.

Snapshot fossilisation: rename revision files to .fsl before deleting chunks. This should be a no-brainer - snapshot is gonna be deleted anyway and, if not atomically, -exclusive can clean up the mess. Currently, a failed prune can totally mess up a storage and while no data-loss happens, you might change your mind if it resulted in ‘disk full’, where it does. GUI users must have this.

Metadata isolation: mentioning this one again because /metadata can be duplicated on storage pools for added redundancy. Maybe pop a copy of the config file in there, too.

This is patently and demonstrably false, as I’ve already covered in previous threads.

Please, just verify your backups from time to time. :slight_smile:

Wish I could tell you Duplicacy could do better than check -chunks - without egress - but it doesn’t, yet. So please, test.

Strongly consider 3-2-1 - one local backup, one off-site. Then, your local backup could be tested more thoroughly (restore). Just remember, datacentres go poof, as do external HDDs.

Agreeing with a lot in your second post here.

I bought 3 GUI licences for personal use, mainly to support development. I use them but don’t regard it fully fledged at all. I might go back to CLI.

My view has always been a GUI should be more than just a wrapper to schedule individual jobs. CrashPlan’s storage was abysmal but at least the GUI was feature-full for most users.

I chose Duplicacy because, as a backup engine (CLI), it’s pretty robust. But to have a “safeguard my data” or a “heal” button, its functionality has to be expanded under the hood.

Duplicacy’s GUI looks nice - the dashboard is clunky but setup is relatively easy. It’s just missing a lot - e.g. selection checkboxes to populate the filter. From a development perspective, it should probably use a more modern framework, instead of an off-the-shelf thing. Quite frankly, open-sourcing it would help immensely, although I understand gchen may not want to go down that road.

Number 1 lacking GUI feature for me is no -latest switch for copy and check. In CLI this is scriptable (albeit wasteful anyway, as you have to iterate through each ID, doing a ListAllFiles() each time - i.e. not atomically), but in GUI it’s sadly impossible.

There’s much much more to be said about ease of use, but I’ll stop here coz this post is long enough. :wink: I’m enjoying the discussion.

1 Like

Right. Hence, “duplicacy check”. I trust amazon to do integrity verification correctly. Simply because risk aversion of them botching is not worth the egress cost.

Yes. I forgot to clarify – what I call storage – its’ not bits on platter. IT’s the whole storage solution. In case of amazon - S3 API. THat’s storage.

Which brings us to this:

This was storage failure: Uploaded “succeeded” when it actually failed.

Check should verify entire datastore consistency. IT has to check everything, as long as you use prune or otherwise have delete access to the datastore.

Yes. Protection has to be there. Not recovery. It’s a pure overhead. If it was infallible – ok. But duplicacy’s implementation only tolerates corruption in some specific places. Not randomly. It’s a bad half-measure, because it promises more than it delivers. Yes, it slightly helps for some users in some ill-conceived scenarios, so I guess it’s OK… I still disagree with the approach to having apps implement storage consistency

Agreed.

YES, PLEASE. I stopped pruning just because of the current nonsensical behavior. Half of the forums posts on check failure is because of this. At least half.

Me too. It’s good to review and re-assess implicit assumptions. Manny time I started writing paragraph expressing a thought, then deleting it and writing an opposite thought. Forcing thought to written text is extremely useful!

If the technical end is practically-faultless, wonderful. People tend to get wrapped around that axle and overlook administrative failures, which I think have a higher probability of happening and can’t be averted through QA. A storage provider could decide they no longer want to have me as a customer and terminate my account. Most have terms of service allowing them to do that at any time for any reason with no avenue of appeal. That would be as much a loss as failure to correctly serve up a stored object.

3-2-1 is a massive overkill. For most home users one replicated copy (i.e. contents of you DropBox or iCloud Drive) and offsite backup (e.g. duplicacy to AWS) is more than enough.

That is 3-2-1: three copies (production and two backups), two on different media (Duplicacy+AWS and DropBox/iCloud) with at least one offsite (AWS and DropBox/iCloud). That arrangement is, in some ways, better than two backups made with Duplicacy because it eliminates Duplicacy as a single point of failure. There are trade-offs. One is having to administer two backup systems. Another is that services like DropBox and iCloud have limits on versioning and deleted file recovery that may not be up to the same level as what can be done with Duplicacy. I’m also unsure how difficult a full-volume, point-in-time recovery would be with either of those services.

Anyway, the bottom line with any of this is that how close you get to perfect depends a lot on how much time, effort and money you’re willing to put into it which, in turn, depends on how much you value your data. Thorough testing of backups requires full reads, and that takes time, effort and money just like anything else.

3 Likes

TL;DR: The point about verification being just a point in time test is valid, it guarantees nothing on a going forward basis. I wouldn’t do full restore/check tests ever, for large backup sets it is not realistic.You’d still want to do test restores just to make sure all the tools/credentials are in place, but my protection against different modes of failure of a single storage (or any other component) would be increased redundancy.

Interesting discussion, some of the statements are a bit strange to see on a backup tool forum. Trusting storage - any storage - to be reliable? Come on :wink: Now, you might deem the chance of failure to be low enough and accept it, that’s fine, but often people simply do not account for different modes of failure. Just because banks and large organizations use something doesn’t mean it is infallible, even for reasonable assurance. I don’t need to go far for examples - literally a few days ago largest internet and telecom provider in Canada experienced total failure of all services, lasted for almost a day. Meaning no internet or mobile for half the country, including electronic payments systems, passenger screening systems etc. If something like that can happen, do you think it cannot happen to your data in a more catastrophic way, no matter who is your storage provider?

The way to deal with that is usually to increase redundancy, and go up the metalevels in eliminating single points of failure. Just like the way of dealing with hard drive failures is not to increase reliability of disks (this is nice, but way too expensive and has fairly low ceiling), but to add them into RAIDs or similar systems. But does using ZFS or btrfs guarantees 100% even with sufficient media redundancy? No, I won’t even mention RAM bit-rot or other fairly low-chance events, but what about software bugs or misconfigurations that could be out there at any stage of the process? I mean, e.g. btrfs is notorious for that. Wasabi issues were traced to some software defects etc. Google and Amazon are not invulnerable against that.

So what if you have concerns about your cloud storage (and you should)? Simple - if you care enough about your data, just add another one or more to your backup strategy. The chances of failure of multiple storages at the same time can be made extremely low (assuming you have enough different vendors). You’d need to examine what is your most likely failure point after that and eliminate it if needed. E.g. if you use :d: copy from primary cloud storage to another, you are exposed to problems on your primary storage, so you may want to do independent backups to two different storages from your primary data source. But if you do both of these backups with :d:, you’re exposed to potential bugs in :d:, so you may want to do entirely different backup toolchain for your secondary (or tertiary) backup etc.

2 Likes

Agree with everything you said apart from this bit.

Independent backups, with Duplicacy, is probably less safe from a resiliency POV, compared with doing a copy - since the act of copying verifies the integrity of source chunks, as they have to be read, unpacked (uncompressed, decrypted etc.), and repacked on the destination.

While not 100% perfect compared to a file-level restore, this is a very good way to run continuous checks alongside backups.

Furthermore, having two copy-compatible storages allow you to use one to heal the other(s) in case of corruption, such as in this example. Which leads onto your next point…

Absolutely.

I use Veeam Agent to do an image-based backup of OS and important data volume. As well as raw copies on external HDD, Rclone-encrypted copies for big media files.

1 Like

I’ll put it this way: I delegate verification of storage consistency to the storage provider in exchange for money. It is still being validated, it’s just not me who is doing validation. And I trust them to do it properly. It’s all about where to draw a line between what to trust and what to verify. You have to draw it somewhere.

Precisely. Nothing is indeed infallible, but how the thing fails determines my attitude towards it. Returning bad data – bad. Outage (no data) – good.

Yes, absolutely this, the whole paragraph. Instead of micromanaging and double checking the bits handling at a specific storage provider – just add another redundant whole backup solution, just like you describe in the next paragraph. This is the approach that is widely used across many industries – instead of trying to improve reliablity of one thing – use cluster of less reliable things. This achieves better reliability at lower cost. It applies to storage arrays (array of inexpensive disks, later renamed to independent devices) and to compute clusters where some well known companies use(d) clusters of consumer grade hardware in production.

Completely in agreement here.

1 Like

As usual, different scenarios have different failure modes, really depends what you believe has higher chances of happening. S1->S2 copy approach detects failures within S1 (e.g. wrong S1 write), but does nothing for failures before it gets to S1 (e.g. wrong read of your primary data, in which case both S1 and S2 would have a duplicate of the wrong data).

Independent writes to S1 and S2 can alleviate wrong reads of your primary data (one of the two reads have better chances of being good) but you won’t immediately see failed writes on either S1 or S2, though you may not care as one copy would still be good. S1 and S2 would not be completely coherent to each other though, which may or may not be important.

You will likely be able to heal :d: storages even with independent backups as long as these are temporally close to each other (though it is not a guarantee). Could be worse if you have a lot of small files due to packing differences.