Backing up to a NAS: scrub and data consistency

if targeting NAS — it must have checksumming filesystem and periodic scrub; if targeting cloud

I have a Synology with BTRFS (using SHR-1) and have never scrubbed (1.5 years). How often should I scrub?

But Is data checksumming enabled on the share? If not — create a new share with checksumming enabled and copy data there. Then delete old share and rename new share.

Synology only says “We recommend performing data scrubbing regularly to ensure data consistency and avoid data loss in the event of drive failures”.

The purpose of scrub is to verify each sector and ensure data is still there by comparing checksums. If you don’t do that — you can’t know if that’s the case.

Drives deteriorate (develop bad sectors and rot) over time gradually. Running scrub quarterly is usual ballpark; I ran it monthly myself. You can configure it to pause during the day and slow it down at night to minimize interference with your work. You can even run it continuously nightly if you want.

But Is data checksumming enabled on the share?

Looks like I did set that up initially, “Enable data checksum for advanced data integrity” is checked for the shared folder I’m backing up.

The purpose of scrub is to verify each sector and ensure data is still there by comparing checksums. If you don’t do that — you can’t know if that’s the case.

What happens when a scrub detects bad sectors or bit rot? Does it automatically repair it as part of the “scrub”? Or do I need to restore from a prior backup?

Have I likely hosed myself for not scrubbing in 1.5 years?

Yes, it will try to repair silently. It will log the successful repair message — so you can check that later if you are curious if it healed anything :).

Your SHR1 arrangement crates raid5 pool so there is redundancy that can be tapped into. If checksum mismatch is detected the software will check/compute the redundant copy and check its checksum. If it is valid — it will repair the corruption silently. For more information see the page 4 in the whitepaper: https://global.download.synology.com/download/Document/Software/WhitePaper/Firmware/DSM/All/enu/Synology_Data_Protection_White_Paper.pdf

Very unlikely, but not impossible. For data loss it’s necessary for failure to occur on two drives for the same block. One failure is not a problem. You have redundancy, so data is safe. However if another thing fails — like second drive, or you replace the drive to grow the pool — you don’t have redundancy anymore and will lose data.

This is also the reason why it is recommended to scrub before making any changes to the array.

Scrub will also as a side effect of attempting to read each sector will update disk’s reallocated sector count as reported by SMART. Without scrub (or it’s own extended smart test — which is pointless to run since scrub is a superset or they) disk can’t self-report increase in bad sector count and can just fail later seemingly out of the blue. In more practical sense — check that found after scrub is done, and it is non-zero — RMA the disk.

Synology used to allow to configure the bad sector threshold when to set alarm, but IIRC it is now hard coded to 50.

This is all exemplifies hidden complexities of maintaining data viability on premise. It’s also very expensive considering your time investment. Commercial cloud storage is always the better deal, except in a very narrow set of circumstances. And for backup specifically — you still need offsite backup…

(I moved the scrub discussion to a new topic to keep things focused)

1 Like

Does this mean it’s better to backup once to an external HDD, and then separately (not copy) to cloud storage?

My original line of thinking was that having an additional backup on an external HDD is better than not having it, but if copy -bit-identical ends up copying bad data (and not just new revisions/chunks) I’ll have to rethink my approach.

Even better — skip external HDD.

Not necessarily. It takes time and effort to maintain backup on the external HDD, pretty much solely by means of constantly verifying data integrity until it fails and then starting over or copying back from cloud or messing with BTRFS DUP profile on single disk — amount of work to support that vastly exceed any benefit of having this flaky backup that demands attention between times it fails.

It’s important to realize that before hard drives were rather solid. Now – with crazy skyrocketing write densities — intentionally and by necessity not. Today drives are intended to be used in redundant clusters: that way its possible to achieve superior reliability at lower overal cost. It’s expected that the drive will fail during warranty period (from single bad sector to total failure) with a probability that are in a low few percents — and rather hard to ignore in long term data storage scenario. These drives are useful as scratch storage — to move a bunch of stuff from one place to another or other short term needs. When stores sell Archival External HDD in plastic usb box with smr archive grade seagate I cringe. But people buy them, actually store photos, and get disappointed years later.

Oh yes, if you copy from external HDD there is a chance the data copied will be bad. If you don’t prune cloud and don’t overwrite existing data you still will be able to restore all data up to the point corruption occur. But it’s indeed rather counter productive to introduce this point of failure into otherwise solid workflow.

Considering all this — solution is simple. Have a reliable cloud backup and drop the hdd. If you worry about a volcano spilling on a datacenter — use multi-region storage tier. It’s expensive, yes, but you can create two backups — one for Uber important data and another for just important and send them to different buckets.

1 Like

Thanks for explanation, the white paper was an especially good read. Great to hear. I’ll schedule a regular scrub.

This is all exemplifies hidden complexities of maintaining data viability on premise. It’s also very expensive considering your time investment. Commercial cloud storage is always the better deal, except in a very narrow set of circumstances. And for backup specifically — you still need offsite backup…

You hit the nail on the head – I do think I’ve purchased my last NAS. It’s true, it really takes a lot of diligence to keep juggling all this stuff, and the time investment is not insignificant. I bought a few NASes because I wanted a project, and a challenge; it has been fun learning the technology, but long term…

Commercial cloud storage is always the better deal, except in a very narrow set of circumstances.

Curious, what circumstances do you think qualify?

For me it was a mix of desire to try out the local shared unattended storage (hence the NAS) and a hope to get a “just works” appliance that someone else will take care of maintaining (hence Synology, and not custom build with TrueNas or something similar). Turned out the opposite way – I became an expert on working around synology bugs and generally walking on eggshells around it, as oppose to it serving me.

By necessity, if you need access to the massive amounts of data concurrently that would not fit into the local cache, with average turnover rate that exceeds your ISP bandwidth. For example – scientific simulations, specific data analysis workflows. But even for those tasks I would first consider feasibility of using S3 storage and running processing on compute cloud, before sinking money in on-premise equipment – depending on a project.

A lot of usecases that seem incompatible with that approach are in fact suited the best: for example, if you import a bunch of photos or videos, process them, and manage the results: as long as the working set fits into cache – you get to work with your data at SSD speeds, while it slowly replicated to the cloud in the background automatically.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.