I have a bigger new chunk of data that I am trying to add to my backup (new files). The backup is going to take about 5 days. I had to interrupt the backup a couple of times (restart PC), so now it spends most time in “packing” and then showing “skipped”, because a lot is already uploaded. I see the process not using much CPU, and it’s not uploading anything. Is there a way to speed this up? Couldn’t there be some sorts of cache so that packing doesn’t need to be redone?
I have threads
set to 32, maybe I actually need a lower number?
I believe my HDD is the bottleneck, but it would be nice if it were doing uploading and packing parallel (my actual bottleneck should be my internet connection).
if CPU is not utilized – it’s not a bottleneck.
IIRC -threads
only sets number of upload threads.
What does activity monitor/Performance monitor/whatever else performance analysis tool on your OS report? Specifically, you are interested in a disk queue (or related metrics, like Disk Response Time on windows).
Is this actually a hard drive? How much free ram do have on the machine? What is the OS?
If HDD is a bottleneck then parallel packing will only make it significantly worse. HDDs are very bad at random IO. Even with one thread performance may be horrendous if you have a lot of small files. There is nothing you can do outside of adding SSD cache in front of your volume (OS specific) or switching to SSD entirely.
Why do you think caching for packing would not help? My understanding of what’s happening is: It’s reading a file, packing it, and then checking and realizing it already exists. Of course a valid point would be that caching is difficult, because it would need to read all files on the disk in order to find out if something is changed… Is duplicacy indeed reading the whole hard disk every time?
Windows 10, RAM is not the bottleneck (12GB free)
Fairly certain it’s the HDD, graph is maxed out (unfortunately Windows resource monitor does not put and legends to their graphs -.-)
Disk queue is between 7 and 12.
Yes, it’s actually a HDD. Would an SSD cache help with sequential reads?
That’s not a sequential read, if your HDD is maxed out. This sounds like a lot of small files, that cause a lot of head movements – go read metadata, go read file, go read metadata, go read file, etc. It is repositioning the head and waiting for the sector to fly by most of the time as opposed to reading data.
SSD cache would cache the metadata and offload some of the random IO from the HDD, thus improving performance. Consider PrimoCache, as a decent piece of software to accomplish that.
It scans filesystem and reads metadata, compares it with what was stored in the last backup (or interrupted incomplete backup) and then picks up differences or continues from where it left of, if the icomplete snapshot was saved.
Have a look at this thread though, it might explain the symptoms you are seeing: Resuming interrupted backup takes hours (incomplete snapshots should be saved) - #4 by gchen
That is because subsequent backups are not fast resumable. You can force an initial backup by switching to a new backup id. That is, for the web GUI, create a new backup with a different backup id but still the same source directory. If you’re using the CLI, modify the id in .duplicacy/preferences
.
After you start the new backup you’ll have more files to pack, since now all files are treated as new, but each file will need to be packed once – if the backup is interrupted, previously packed files will not be packed again.