I’m using Duplicacy to backup around 1.5 TB to Backblaze B2 and am considering backing up roughly 10 TB to GSuite. B2 offers 1 GB of free egress bandwidth per day and GSuite offers a lot more, but also has limits.
I can run duplicacy check -all
to make sure the chunks exist, but if I want actually validate the integrity of the chunks in the backup I think my only built-in option is to run duplicacy check -all -files
, which would download everything and far exceed the free daily egress limits (or the hard limits in the case of GSuite).
I’ve tried limiting the bandwidth using trickle -s -d 10 duplicacy check -all -files
, but trickle
doesn’t seem to work with duplicacy and there’s no built in option to limit bandwidth (like there is for backup
).
The ideal solution in my mind would be to have something always slowly (based on a specified bandwidth limit) perform full integrity checks of random files from the storage, and then send emails and/or throw errors if it encounters a file with corrupted chunks.
To work around this, I’ve written a script that will copy the .duplicacy
directory to a temporary directory, select a random snapshot and random files that fit within a specified daily egress limit, and then restore them to prove integrity (at least, of some random selection of files). Disclaimer: I wrote this for my specific setup and it probably has unhandled edge-cases, so it probably won’t work for all use cases and shouldn’t be trusted. Depending on your files the overhead of downloading the snapshot file and chunks might also be significant compared to the specified limit.
Is this kind of throttled, full integrity checking something worth submitting a feature request for Duplicacy? If it’s outside the scope of Duplicacy, then maybe someone else can find some use for the script I wrote.