Slow check on local SSDs

I’ve got a single computer with a bunch of backups, roughly 1.3TB of data. I then want to check that all the backups are ok.

Running duplicacy check -all -files seems very slow.

Listing all chunks
19 snapshots and 817 revisions
Total chunk size is 1387G in 311396 chunks

Then after 10 hours it’s only got through about 1 and 1/2 of the snapshots.

CPU looks idle:

Tasks: 939 total, 1 running, 938 sleep, 0 d-sleep, 0 stopped, 0 zombie
%Cpu(s):  3.0 us,  0.5 sy,  0.0 ni, 96.2 id,  0.1 wa,  0.2 hi,  0.0 si,  0.0 st 
MiB Mem : 257598.0 total, 231542.0 free,  22012.5 used,   6507.7 buff/cache     
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 235585.5 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                          
  21258 root      20   0 5505376 350824  20256 S 156.8   0.1     16,10 duplicacy  

iotop shows extremely low I/O:

  21293 be/4 root        6.35 M/s    0.00 B/s  ?unavailable?  duplicacy check -all -files -stats
  21297 be/4 root        9.46 M/s    0.00 B/s  ?unavailable?  duplicacy check -all -files -stats
  21303 be/4 root     1667.52 K/s    0.00 B/s  ?unavailable?  duplicacy check -all -files -stats
  21306 be/4 root        5.41 M/s    0.00 B/s  ?unavailable?  duplicacy check -all -files -stats
  21307 be/4 root        4.84 M/s    0.00 B/s  ?unavailable?  duplicacy check -all -files -stats
  21311 be/4 root       20.73 M/s    0.00 B/s  ?unavailable?  duplicacy check -all -files -stats
  21315 be/4 root       13.41 M/s    0.00 B/s  ?unavailable?  duplicacy check -all -files -stats

There is probably another 300MB/sec available for reads.

So I’ve got low CPU, low read speeds. I guess I could manually parallelise - is that the best option?

But also is this efficient? Is it re-reading the same chunks multiple times?

Having reread the forums, -files is the wrong option. Rerunning using -chunks -threads 8 and much quicker.

-files checks integrity of actual files. -chunks checks integrity of chunks. If the storage rot is suspect then indeed -chunks is more appropriate. If the goal is to verify the duplicacy internals didn’t mess up your files — then -files is closest proxy to actually restoring and validating every file.

Still, you should be able to use -threads with -files. Did it not work?

Side notes

  1. -chunks records which chunk has been verified and will skip those next time. If you want to verify them all again delete verified_chunks file
  2. If you suspect the storage is prone to rot, enable erasure coding.

One thing I would check before manually splitting it: run the same check on one snapshot with -files -threads 8 and compare throughput. If that still sits at very low I/O, it is probably spending time on metadata / lots of small file reads rather than raw SSD bandwidth. For the storage-rot check you already found the better path: check -chunks -threads 8, then keep that as the regular scheduled check and only use -files occasionally or for a smaller set of snapshots.