How long should check -chunks take?

FlexiPack · 11 October 2020 14:16

New user here so forgive me if this is normal behaviour for the program but I just wanted to double check it is before moving forward.

I have a 1.54tb backup on local storage, just one snapshot so far. I wanted to try the ‘Check’ command (via the webGUI). So I scheduled this and manually started it with options: -chunks - stats.

After 8 hours it hadn’t finished so I aborted the operation. After checking the log files, it appears to have been 65% through the operation (If i’m reading it correctly). This is the last line:

-11 05:42:44.089 INFO VERIFY_PROGRESS Verified chunk a7ce205b025ea705a4b49324cc14a328cd10f7954d3c990957ba047452028947 (199611/309514), 36.09MB/s 04:12:03 64.5%

So I’m guessing that if left to run, it would have taken a total of around 12 hours to complete. Is this about right for a 1.54tb snapshot?

saspus · 11 October 2020 17:59

If you specify -chunks flag Duplicacy will validate each chunk, which requires fully reading it and recompiling the hash. How long will it take to read 1.2TB worth of data split into small files with slightly varying sizes? There is throughput limit and seek latency limit, so I’d say the process you are seeing in not unexpected.

As a side note, if you storage is bit rot aware (redundant BTRFs or zfs array with data checksumming enabled) then it already guarantees data consistency so you only need to check for chunks presence, not content. Which would be fast, it only requires reassign metadata.

If your storage does not guarantee anything — like a single hard drive — then I would enable the erasure coding on the repository. Then the rot has lesser chance to destroy the data but you can’t really do anything with the knowledge of the rot happening: you could do some trickery to have entire chunk recompiled and reuploaded but it’s extra work. So I would not check chunks either. In other words it’s better to use storage with integrity guarantees than the flaky one and check periodically. Because when check fails — what now?

FlexiPack · 11 October 2020 22:23

Thank you for the information. As I understand it, I cannot enable Erasure Coding on an existing backup. So I will have to delete it and re-upload?

andrew.heberle · 12 October 2020 03:25

You will need to create a new storage with erasure coding enabled (using duplicacy add), which you can then copy your existing backups to:

duplicacy copy -from default -to NEW_STORAGE

This will be quicker than re-running backups to the new storage as it saves all the chucking, hashing, compression etc.

You can then point your backups to this new storage:

duplicacy backup -storage NEW_STORAGE

You could also edit your .duplicacy/preferences file so the new storage becomes your “default” and the initial one is removed…or completely remove the .duplicacy folder from the repository and then re-do the duplicacy init process…

Duplicacy will see that the storage is already configured so you won’t lose any backups or anything and this means that you don’t have to worry about messing up the JSON preferences file…but this is not great if you have a lot of storages to re-create and keys/passwords saved in Duplicacy’s keyring, as this is also stored in the .duplicacy folder.

FlexiPack · 12 October 2020 15:00

Thanks for your help i’ll give it a go.