I have been doing some informal testing of duplicacy to figure out how to speed up my backups to a local RAID5 4-hard-drive array. At one point, to backup 2TB, it was saying it would take 8 days using the -stats option.
Specs:
Backup Source: 3TB “hybrid” drive with spinning HD plus 128GB cache
Backup Destination: 4-Bay, USB 3 with 4x8TB drives RAID 5 running with Softraid on Mac
I tried various -threads counts, all the way from -threads 64 down to -threads 1.
Here are informal results. By informal I mean that I would kill the ongoing backup, then resume at the same spot with a different thread count, wait until it stabilized on a long stretch where there were minimal duplicates, and then recorded the MB/s. It is not rigorous, but at least an indicator.
- -threads 1 ~12MB/s - Not great, but at least it will finish the initial backup in about a day or two.
- -threads 2 ~1.93MB/s - very slow, it was slated to take well over a week
- -threads 64 ~1.1MB/s - extremely slow, earlier on before it stabilized, it was running at only 100KB/s
- -threads 16 ~619KB/s - seems even slower, but I did not let this one run very long, so it may not have stabilized at full speed
- -threads 32 ~2.0MB/s - This was a longer test than for 16 threads, and converged to a somewhat higher rate, though not significantly different than 2 threads
The bottom line is that only one clear winner emerged: 1 single backup thread.
In thinking about it, this may make more sense than it seems for a spinning hard drive. That’s because writing each chunk requires a head seek, and if there are multiple threads, that’s a LOT more seeks going on. In fact, the way I pinpointed what was going on is running a separate speed test on the drive while the multi-threaded processes were running. The array normally writes at 200MB/s, but with more than 4 backup threads running, it slowed down to less than 9MB/s.
With only 1 thread running, it is still writing at over 130MB/s for separate processes.
Based upon this, I am guessing that an SSD would actually benefit far more from multiple threads, because there is no seek involved.
However, for spinning disks locally, I’ll be doing all my backups with -threads 1, unless I’ve missed something here.