How should i use the -check command in the WebGUI?

Derek_v2 · 2 May 2020 04:45

I’ve read the doco on the -check command in the Guide and i’m unsure how to use it to do a check that will actually find something wrong with the repository - except by using the -files command.

I would like to run a check on a monthly basis with the scheduling.

Some questions:

I note that the instructions say that doing a -files check is presumably something that takes a lot of space, and takes a long time. The information about what -chunks does isn’t clear to me - would that be an effective check operation to identify issues with the repo?
Can the check operation be performed on the backup server, instead of from the client?
Is it safe to run the check operation in parallel with backups?
I’ve read in the forum that checking just the latest snapshot is a good way to do it, but apart from manually specifying it in a command i couldn’t see how i could ask the WebGUI to do that.
I can’t see how to schedule a check outside the rest of the backup parameters. I plan to backup daily, but check monthly. Is that possible?

Thanks

gchen · 3 May 2020 02:28

Yes, -chunks is more efficient because it only downloads each chunk once. -files may download one chunk multiple times, especially when you check multiple revisions at once.

Yes, you can run the CLI or the web GUI directly on the backup server.

Yes.

Currently there is no way to specify the latest snapshot in a check command. I’ll add this option later.

You can create a new schedule and then in the new schedule create a new check job.

Derek_v2 · 3 May 2020 10:22

Thanks for the info.

RE scheduling, I forgot about the little + down the bottom left, which can be used to create a new schedule (that placement/and purpose could do with some improvement - there’s little to suggest it does anything other than the other + at the top right. That took me quite a while to find it and i only persisted because i didn’t dare come back without figuring it out

So that’s sorted. However the least frequent option is Daily. While also deselecting all the days except one i guess that implies it can be set to Weekly?

Weekly might be ok. How long could i expect a check (-chunk mode) to take on, say 1TB of backups? For the sake of ease, lets pretend those backups are on a USB3 external HDD or even a 2nd HDD in the computer. I may end up using systemd to schedule the check on the backup server to do the check, but it would be good to have an idea.

Thanks.

tangofan · 3 May 2020 18:29

Yes, weekly is the longest it can be set automatically. Alternatively you could turn all days off and just use it as a schedule to start manually.

First let me say that backups on a single HDD are a bad idea, since you don’t have bit rot protection. A backup to a cloud storage provider or a local device that does provide bit rot protection are safer. My local device is a Synology NAS with BTRFS file system, data checksum enabled and disk redundancy, which does provide bit rot protection.

Re. the speed it should be very fast. I just did a test run on my setup (local NAS connectec by 10G LAN) and it took less than 2 minutes for 6GB of data. However this setup is probably faster than a single USB HDD, so yours could take 2-3 times longer. My guess is that yours would complete within 1-2 hours.

brian · 4 May 2020 20:09

For some reference, I have 4 repos in the same storage (B2), that total about 3 TB. When I run a check on each repo with -id <repo name> (with no -chunks arg) and the latest revision specified, it takes about 7 minutes each (most of that time is listing all the chunks, which is the same for each). Running a check like the web UI does, for all with the stats & table, takes about 29 minutes total.

Derek_v2 · 5 May 2020 06:20

I don’t plan to use a single HDD, it was just an example for consistency - removing variables like slow networks or such.

Re bitrot, is this an overstated problem? I was given a PC by a friend who left it in the shed for probably close to 15 years. The case was rusted, the inside was filthy and it was in appalling condition. I got the HDD out and recovered the data up until the HDD died - it would not receive power anymore. I recovered several dozen GBs of data - photos mostly, some videos too. Not a single image we checked was corrupt or damaged, all the videos we tested were fine. We looked at a lot of stuff, because it was from when he was a kid so it brought back memories.

So yeah, despite being in moist conditions for over a decade there didn’t appear to be any bitrot on what we recovered. The conditions being so bad the HDD couldn’t handle recovering the whole lot before it just died.

I have recovered data from my own 3.5" floppy disks from the late 80s, and the first HDDs i ever owned in the early 00s. Granted some of the 3.5s" are busted - but the Amiga disk drive was notorious and i had so many r/w errors on disks that were just heavily used’ back in the day.

That said, i don’t see any reason why i wouldn’t chose BTRFS on my backup server, which i haven’t built yet - it’ll just be a Pi with a JBOD case running software ‘RAID’ (or parity, is that RAID?).

tangofan · 5 May 2020 07:03

On photos or videos you likely won’t detect bit rot, since a flipped bit here and there isn’t really noticeable and I wouldn’t hesitate to backup media files to a single HDD. It’s for “important” data that it really matters. I recently did a “deep” compare (content comparison) between main main data drive and an old backup drive and found that two XLS files from 2012 and 2015 had a bit flipped. That backup drive was from the early to mid 2000s and it seems that I used it far too long. Bit rot doesn’t necessarily mean that there are surface errors, it could also mean that there is some error in the HDD controller (which of course doesn’t apply to floppy disks). I tend to use this term for any kind of wrong data returned by the drive and not detected as faulty.

I have no experience with Pi, but on my Synology NAS JBOD does not have disk redundancy, so it isn’t suitable for bit rot protection. There you need to use RAID-1,6 or 6 or the “Synology Hybrid RAID” variants of those standard RAIDs.

Derek_v2 · 5 May 2020 11:26

I don’t know how parity works, or what RAID it’s used in. I currently use it in ‘unRAID’ (NAS software available for purchase based off Slackware). The way they implement it is pretty cool - it splits data at file level at a minimum - you can do directory or whole share if you like. So there’s no files half in one place, half in another. You do sacrifice some speed, but meh, it’s fine for me.

I don’t know if this is how it always works, but i can take the (non-parity) disks out of my NAS and just pop them into any dock and mount it and there’s my data. No need to rebuild the RAID or worse be stuck with Synology’s proprietary hybrid which i presume to mean that if my old Synology were to die i would have to buy another Synology to rebuild the array (my old NAS is a Synology).

We’ve gone rather off topic here, but thanks for the reminder to look at using BTRFS The Pi will do software RAID on the JBOD using the Debian-based OS (mdadm), or if i feel like it i might put Open Media Vault (OMV) on it to make that process easier - and potentially give my parents some other uses for the space. I’ve never used a Pi or OMV or mdadm or BTRFS - so lots of fun to be had

But first - i gotta get a backup solution going on my wife’s PC to my Backup server (she uses Windows, i use Linux). All my testing has been good with Duplicacy, and i like how the developer is active here and the product keeps getting updated. I just need to decide on my approach now - and whether to use Duplicacy pushing over SSH (preferred) to the backup server, or use Borg (which i already use and am familiar with) to backup via a mount. The license fee for Duplicacy doesn’t bother me at all. Initially i’ll just have 2 Windows PCs to backup, with 2 more likely in a year or two. And i’ve paid for plenty of free software through donations so this is fine. I just gotta choose

tangofan · 5 May 2020 16:30

As the name unRAID suggests, this isn’t standard RAID by any means. For an overview of how RAID works, see here. RAID-0 doesn’t have parity, but all the other types do. And in RAID as soon as you remove a disk, your array is considered degraded and you have to repair it.

The way I understand unRAID it sits on top of the file system, so if that were BTRFS, it might be able to detect bit rot (if it was configured to use checksums), but it wouldn’t know, where the good copy of the data is and thus it wouldn’t be able to repair the error. (I’m not sure, if unRAID would be able fix this on the higher level.) In Synology’s implementation BTRFS sits on top of the mdadm RAID and - if case of a checksum failure - can query mdadm for information that allows it to read the good copy and repair the bad copy.

If the goal is to get a solid and reliable backup, then we haven’t strayed all that far.

Derek_v2 · 8 May 2020 06:47

Tks, good info (e.g. i had no idea about which sits on which). To save effort, i was thinking of using OMV to handle that, so i’ll look at that first, then if i have to look into doing it manually in Raspbian or such. I’m not terribly concerned about the ‘overhead’ of running OMV

But yes, i’ll use BTRFS given the choice.