Preventing bit rot and ensuring data integrity in my Duplicacy + Unraid NAS + offsite storage setup

philippe · 19 December 2024 15:32

Hi everyone,

I’m new to the world of backups, so I hope you can bear with me as I navigate through this learning process.

I own a large amount of critical data, including business documents and Lightroom photos, and it’s absolutely essential that I can never lose this data. To achieve this, I’m working on implementing a robust backup solution that protects against hardware failures and even bit rot.

Currently, I’m using Duplicacy on an Unraid NAS with an XFS file system. Below is an overview of my setup (I’ve included a drawing for reference):

My Backup Flow:

Primary Backup Location

Duplicacy storage resides on my Unraid NAS.
I back up data from various sources, including other Unraid shares (~directories) and external hard drives.

Offsite Backups:

I copy the Duplicacy storage on my NAS to two cloud storages: Backblaze B2 and Amazon S3.
For this, I just run the copy command. Note that I haven’t enabled the “bit-identical” option.

Backup Maintenance:

I run check -chunks daily on my NAS storage and also daily run check on the offsite backups.
I haven’t used the prune command yet, as I’m concerned it might inadvertently compromise my backups.

In addition to this, I sync photos to the NAS for immediate access, documents to Google Drive, and back up local machines using Backblaze Personal Backup. However, these flows are less relevant for this discussion.

My Concerns:

This setup has been running smoothly for a few months, but as I’ve started learning more, I’m beginning to question whether it’s truly resilient against all potential disasters.

Here’s one scenario I’m particularly concerned about:
If my NAS hard drives suffer from bit rot, a corrupted chunk in my NAS Duplicacy storage could render the backup unusable.

My Questions:

Propagation of Corrupted Chunks:
If a corrupted chunk appears in my NAS storage due to bit rot, will this corruption propagate to my offsite backups during the copy process? Does Duplicacy perform any checks before copying to prevent corrupted data from being uploaded?
Recovery Options:
If my NAS storage is corrupted, is there any way to repair it using my offsite backups? I understand the chunks between my NAS and offsite storages differ since I haven’t enabled “bit-identical.” Would enabling this option allow me to repair corrupted chunks on my NAS by copying them back from the offsite storage?
Erasure Coding:
Would enabling erasure coding help mitigate the effects of bit rot or prevent corrupted backups in this scenario? In what scenario would this option be usefull?
Integrity Checks:
Does running check -chunks identify bit rot within the chunk files themselves?
Storage Reliability:
I understand Duplicacy assumes the storage provider ensures data integrity, but that’s not the case with an Unraid NAS using XFS. Would it make more sense to:
a. Transition to a ZFS file system in a separate pool on my NAS server for Duplicacy storage?
b. Simplify things by removing the NAS storage altogether and backing up directly to offsite repositories?

Final Thoughts:

I know this is a lengthy post, but I’d really appreciate any insights or suggestions on how to improve my setup. Data integrity is my top priority, and I want to ensure my backup solution is as foolproof as possible.

Thank you in advance for your help!

Cheers,
Philippe

saspus · 19 December 2024 16:19

This does not guarantee data integrity. Switch to zfs.

Therefore having primary backup location on a media that does not guarantee data integrity is a problem.

There is no reason to do even that, if your storage can’t rot. (And if it can — check -chunks won’t help, it checks every chunk only once)

There are no known issues with prune corrupting backup. You can always run check after prune and something went wrong restore whole duplicacy repository from a snapshot (see above advice on zfs)

Right. Hence, switch to zfs

My answers:

Likely not. Duplicacy unpacks and repacks chunks during copy. But this won’t help you — data loss would have already occurred.
Yes, you could copy snapshots from remote storage to unraid the same way you copy from unraid to remote storage
Not really. Erasure coding would improve chances of recovery. Switch to zfs instead.
Yes, but once per chunk. If you want to recheck all chunks, you would need to delete verified_chunks file. But if you use zfs — use its scrub functionality instead.
I would switch to zfs. I would also forgo B2. They have proven in the past that their systems are lacking proper checks to prevent retuning bad data. You may want to consider storj — it can’t return bad data by design as its end to end encrypted

In summary:

backup everyone to local zfs pool, with monthly scrub.
enable periodic snapshots in case anything happens. It’s free, so why not?
duplicacy copy to Amazon and storj.
run check monthly (just check) to ensure all referenced chunks still exist.
once a month attempt to restore random subset of files.
once a year attempt to restore a few random onto a new machine to ensure you have all necessary credentials available.

philippe · 20 December 2024 07:25

Thank you for your detailed reply. It makes a lot of sense and confirms my concerns and doubts.

I’ve decided to add a ZFS pool alongside my XFS array. This way, I get the best of both worlds: the flexibility of the Unraid XFS array for my replaceable media collection and the anti-bit-rot, self-healing, and snapshot capabilities of the ZFS pool for my critical data and Duplicacy backups.

I also wasn’t aware of the issues with B2. I’ll definitely look into Storj!

Thanks again!

saspus · 20 December 2024 17:03

This is the thread:

They, of course, already fixed that - but while bugs are expected in any system, allowing mishaps of such magnitude to happen at this stage of product maturity is simply not excusable. It’s not unexpected, though — seeing the abysmal quality of their user-facing software, I can only imagine what horrors are lurking on a backend.

saspus · 20 December 2024 17:52

offtopic: not an unraid user, and google failed me – what are those flexibilities of XFS on unraid? It seems that the only benefit would be modest memory requirements for specific performance, but since you are going have to have zfs anyway – that should not be a problem. Did unraid add some proprietary features on top of xfs that are appealing to their users?

philippe · 20 December 2024 19:00

I’m a fan of Unraid, though I’m still learning the ropes. Here are what I consider the strong points of Unraid:

Unraid allows you to mix and match disks of different sizes in the same array. They don’t need to be identical, making it very easy to expand storage over time by just adding more disks.
You can use one parity disk (single parity) to protect against failure of one disk in the array, or two parity disks (double parity) to protect against failure of two disks in the array.
Unlike traditional RAID, data is not striped across disks, so if one drive fails, the others are still accessible.
Power consumption is also lower because only the disk being accessed and the parity disk need to be spun during read/write operations.
Beyond the array, Unraid allows you to use additional disks to create separate pools for extra storage. These pools can use different file systems, such as ZFS, making it possible to maintain an easily expandable array for less critical data (e.g., Linux ISOs lol) alongside a ZFS pool for highly critical data.
Unraid also supports cache disks, which can be SSDs, to improve data access speed for the array. These cache disks can be mirrored for redundancy and even configured to use ZFS for enhanced reliability and performance, or to dedicate specifically to apps like Plex or other dockers that require higher performance.

saspus · 20 December 2024 19:52

Thank you!

So they did bolt on some functionality (not unlike Synology SHR) to use a zoo of disk sizes – and my vague understanding was in the right direction. Anything proprietary in a data path is a huge downside.

My (very opinionated!) commentary below.

I find that questionable at best and a lot of drawbacks:

Much smaller scope of testing. Unlike base filesystem, the unraid (or synology) specific features get only exposure from their users, a much smaller subset of base filesystem users.
Mixing drive sizes usually results in a multiple raid configurations scatters over the pool, with unpredictable performance. It’s always best to use same sized disks, and retire/sell/recycle disks that too small: they are poor value but consume energy just as large ones.

If this is unraid bolted on as well – it’s just as problematic. ZFS can do that doo – see raidz1 vs raidz2 vdevs (as does BTRFS). In addition, you can have multiple vdevs in the pool, providing more performance by load balancing the IO

Very dubious “benefit”. How would you know (and why would you care) which file is on which disk? What if your project consists of multiple files? If one disk fails, all files must be still accessible, not just half

This also means you don’t get any gains in read performance.

So I’ll take it as a “drawback”.

It’s best to keep all disks spinning at all times – for performance (and some argue disk longevity: power surges and temperature fluctuations) reasons.

Why differentiate between critical and less critical data? All data is critical. I don’t want data loss. If I lose “linux ISOs” – I’ll have to re-aquire them. if I lose my documents – I need to restore them from backup. Amount of time wasted is pretty much the same – I’d argue restoring documents will be even faster.

Same objection about performance. I want fast array. Why woudl I on purpose have slower configuration when the system is clearly capable of a faster performance – as evidenced by the array right next to it?

This is what I ended up doing for home server: a singe ZFS pool, consisting of three zraid1 vdevs of 4 disks each.

This started as a single data vdev pool, and then I either add a new 4-disk videv if there is a good deal on used drives on ebay, or replace disks in the existing array with the higher capacity ones.

I.e. within vdev disks are of equal capacity, but not necessarily the same across vdevs.

In addition, there is special vdev, of two SSDs (mirrored; its’a waste, but “best practice”). Special device makes everything fast – metadata access is what hinders most of performance, and its’ on SSD now. Some databases live entirely on special device – configuring small block size on that dataset equal to the record size forces them there. There is a SLOG device on an old 16GB Optane drive (huge overkill, but it was $10 on ebay) to offload some IO from synchronous writes (from Time Machine) and old 2TB crappy SSD for re-used content, to further reduce IO on disks (also overkill, I salvaged that from old USB SSD from ebay for $20).

As a result, disks are almost idle (I’m seeing 20-30 iops on them on average) and everything works in a steady state. It’ also simple: single pool, single filesystem, one thing to manage. If I had multiple pools I would need multiple SSDs to speed up access, etc etc.

The key is that it’s OpenZFS pretty much as is, ixSystems does not bolt on their proprietary changes like Unraid does, instead, they submit bug fixes and contribute to features directly to OpenZFS, and hence it’s stable as a rock. TrueNAS is used by enterprises including in critical application, so it’s definitely good enough for general home users.

Edit. This turned out to be zfs vs world rant, but I did say the commentary woudl be very opinionated.

philippe · 21 December 2024 09:00

I appreciate your (opinionated!) commentary and genuinely enjoy these kinds of discussions—they make for a great exchange of ideas.

Unraid isn’t RAID, so this doesn’t really apply. Unraid doesn’t stripe data across drives; each file resides entirely on a single disk. This means that each data disk is an independent filesystem and can be read individually on any Linux system. Performance is limited to the speed of a single disk, which is inherent to its design and is ofcourse a drawback compared to ZFS.

The flexibility to mix and match drive sizes is intentional and aligns with Unraid’s use case: expanding storage incrementally without requiring matched drives.

While smaller disks may be less energy-efficient, they still provide utility in an Unraid setup where storage needs vary.

The core principle of Unraid is quite different from that of ZFS. Unraid is less focused on I/O performance and more on providing flexibility in how you build and expand your array. The use of single or double parity isn’t a downside—it’s a clever way to achieve redundancy without relying on traditional RAID striping. The main limitation is that the usable storage of any data disk in your array is constrained by the size of your parity disk(s), so starting with a large parity drive is essential. Once that’s in place, it just works: if a drive starts to fail or fails completely, you simply replace it, and Unraid rebuilds the data from parity, no complicated configurations required.

If no disaster occurs, you really don’t need to care which disk a file is on. Personally, I use the “high-water allocation” method, which fills up each disk to half its capacity before moving on to the next one. This approach helps balance data across the array while minimizing unnecessary disk spin-ups and wear. And if you’re curious, you can always check which files are stored on which disks directly in the GUI.

Where this design really shines is during a disaster. Imagine you have a faulty drive and only single parity protection. You replace the faulty drive, and Unraid begins reconstructing the data using parity. But during that rebuild, another drive fails. In this scenario, you lose the ability to recover the failed drives’ data—but critically, all the remaining drives with intact data remain fully accessible. This reduces downtime significantly, as you only need to restore or redownload the lost data for the failed drives, not the entire array.

For me, any redundancy system—whether RAID or Unraid’s parity—is about minimizing downtime and disruption when things go wrong. In that context, I see this as a definite plus.

I agree that not having any gains in read performance is a downside. However, for many homelab users, this doesn’t matter much—most setups are bottlenecked by other factors like Ethernet speed rather than disk read performance. In those cases, the practical impact is negligible.

I probably worded it a bit strangely earlier. What I mean is that not all data on my NAS requires the same level of backup or protection. Some of my data—like Linux ISOs—doesn’t need to be stored in expensive off-site locations like S3 or B2 because it’s easily replaceable. If I lose those ISOs, I can re-download them with one click (yay for unlimited 1Gbps internet!). This data resides on my parity-protected array, and if a drive fails, I can simply rebuild it from parity. In the rare worst-case scenario where more drives fail than I have parity for, I might need to re-download some of that data, which isn’t a huge deal.

On the other hand, my “critical” data—like personal photos and business documents—is irreplaceable, so I ensure it’s as secure as possible, including off-site backups and bit rot protection.

I get your point about performance—it might seem odd to use a less performant array with less protection against bit rot. That’s definitely a trade-off. But the upside of a parity-protected array is the flexibility to easily swap smaller drives for larger ones, add new drives, and have redundancy handled automatically by the parity system. For most home users, including myself, this is more than sufficient. Bit rot is a theoretical risk, but it’s extremely rare, and the I/O performance of a parity-protected array is good enough for typical home use.

In an ideal world, I’d have a ZFS pool where I could easily add more disks to the pool as needed. For now, I’ve split my NAS into a “flexible” array for easily replaceable data and a ZFS pool with four drives for critical data. Who knows—maybe in the future I’ll transition everything to ZFS on Unraid. It’s good to have options.

Interesting discussion. Unraid and ZFS-based systems are designed with different users in mind, and they each have their advantages and trade-offs.

system · 31 December 2024 09:01

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.