I’d like to describe how I wanted to set up our home server and handle backups, and then some big dilemma’s - would be nice to get feedback and hear how others are approaching this.
Context: a server at home, 20% for public-facing sites, 80% for personal storage by family members (not all in the same place). The sites are simple, they just need to be up and running. Personal storage is all within Nextcloud (OSS Dropbox equiv w/ calendar, address book, photo galeries, and more). Lots of photo’s, also for semi-professional work (i.e. big high-res “shoots”).
Our data demands are modest. Total storage right now 1 TB, expected growth perhaps 50 GB/mo average.
Now the setup I’m working with: Intel i3 NUC, 256G root SSD, and 2x 2T ext USB3 HD’s.
The system runs Ubuntu 16.04 LTS (I’m comfy with Linux), and the external HDs are set up as BTRFS with raid1 profile, as data area for Nextcloud.
And to round it off: I’ve set up a remote SFTP server on a Raspberry Pi, as off-site backup for Duplicacy. Which is working splendidly, btw.
To summarise: the data lives on 3 disks, of which one off-site with history. More when using Nextcloud in Dropbox mode, with (some of) the data synced to each person’s own laptop, tablet, etc.
The local setup has been running for nearly a year, but the off-site backup is recent.
And yet … I’ve run into serious trouble, with days needed to get things back on track. One of the two BTRFS drives failed when I started adding a 3rd drive (I suspect a USB glitch while live-inserting the new drive). And from there it went downhill (I also suspect that BTRFS on Ubuntu 16.04 is not really ready for major failure scenarios).
The trouble with this sort of thing is not the metadata (file listings) but the data itself. It turns out that some files (about 1000 so far) have become unreadable (caught by BTRFS’s checksums), and I’m restoring from the off-site Duplicacy backup. I do not want to find out a year from now that this failure has led to bit rot somewhere - it needs to be resolved 100% (and then I’d like to get my life back, please).
This whole recovery is taking a long time (even longer because a 2nd attempt messed things up again, and I’m now pretty sure it’s a software issue in BTRFS). So I’m ready to ditch BTRFS. Its features are fantastic, but it looks like some (basic) failure scenarios are just not getting the attention they need (lots of info on the web is stale, bugs unsolved, wikis outdated - as usual in the fast-moving OSS world).
Which brings me to the dilemma: how to best set up this system for long-term peace of mind?
I’m now considering a single-disk EXT4 2 TB for Nextcloud, a 2nd disk locally managed for easy and fast redundancy, and the 3rd disk off-site (leaving it just as it is right now).
The question is how to set up that local redundancy: LVM-based RAID1? Periodic rsync in combination with local mount/unmount? Periodic rsync over the LAN to an independent little setup, e.g. another Raspberry Pi? Duplicacy to a direct-mounted 2nd drive? Duplicacy over the LAN to again a Raspberry Pi?
I don’t want to start an endless discussion, after all everyone has different trade-offs. But perhaps some suggestions, tips, critiques? I’m ok with this mishap, after all it was my decision to not have our data on some cloud service. But it’d be nice if it never happens again
Cheers,
-jcw