Erasure Coding - Explain it like i'm 5 :D

I’ve read this post and it’s just above me. What’s a simple explanation for why i may, or may not want to use it? I notice it’s not on by default in the WebUI.

Cheers :slight_smile:

If your target storage may allow data rot — I.e. you are backing up to a single hard drive, with non-checksumming, and/or non-redundant files system, then enabling erasure coding can reduce (not eliminate!) risk of data loss by writing chunk data with redundancy and allowing to recover from limited data corruption.

It’s a bandaid, because maintaining data integrity is job of a storage solution, not application writing the data (how far shall we go — doubt media, doubt filesystem, doubt ram, doubt CPU?). Application must assume storage is reliable.

Indeed, most storage services already guarantee data integrity, so this feature is only of limited use when in a pinch you have to backup to a storage that does not. But again, it provides no guarantees, just improves chances of recovery.

2 Likes

yes, @saspus has much more experience and thought in this area, but fwiw i have a local NAS storage (RAID1) which i do use EC on, and a cloud copy that i don’t use EC on. i assume my cloud provider can store my stuff without bitrot, i’m not 100% convinced the NAS can.

It’s a dangerous assumption.

It strongly depends on a cloud storage tier (not all cloud provides guarantee data integrity and/or on all tiers) and on a filesystem used on a nas (not all raid arrays guarantee data integrity either, and those that do – require periodic scrub).

Famous example with conventional raid1: imagine one disk develops bad sector and returns read error. Then raid controller software can easily recover the data by restoring a copy from the sector on the other disk, the one that did not report read error.

But if the data rotted, in an unfortunate way to fool the disks own CRC checks, then you have two disks returning different data and none of them reports any errors. Which data is the correct one? No way to tell. So, conventional raid will overwrite good data with bad one in 50% of cases. Here is a good old video on the topic: https://www.youtube.com/watch?v=yAuEgepZG_8

Checksumming filesystems (like zfs, or btrfs) can tell bad data from the correct one – by the checksum. Storage appliances run periodic scrub where all data is read, checksums recomputed and validate, and all mismatches fixed on the spot.

That’s the only way to keep data viable.

Thanks. That was extremely helpful.

It sounds like it would be wise to enable it for my backups - which is a PC at my parents place which does not have bit-rot protection of any note. Not even ECC. I understand it might help recover from some minor degree of corruption, and that it’s not a guarantee.

I’m a home user with a limited budget.

Additionally, i was going to leave the default numbers, but again i don’t follow what they mean or why i might change them.