Duplicacy performance: 100% Single CPU Thread

knightar · 28 May 2019 20:30

I am in the process of migrating a few months worth of Virtual Machine backups to Duplicacy using 1M min/max/c init options, as suggested in the forums … I am currently backing up to the local hard drive on the backup server we have that stores backups and then I am planning to copy to B2 for offsite backups. As I always plan to backup primarily to local storage and then copy to offsite backups, I am looking primarily at local backup performance and have noticed that Duplicacy is limited to one CPU thread. As the server has 48 cores with an extremely fast PCI-E raid, I am currently hitting 100% single thread and not getting the best performance possible due to single thread limitations and no multithreading.

What is the possibility that it would be possible to implement CPU multithreading support? It would be beneficial to have multiple CPU threads compressing/encrypting then a single thread, especially in my case.

leerspace · 28 May 2019 21:04

Have you tried using the -threads option to increase the number of threads used for the backup?

knightar · 28 May 2019 21:11

Yes, there is no effect. From I read, -threads only affects uploading to cloud storage (such as B2), not the actual CPU threads used. There is no change in the performance.

leerspace · 28 May 2019 21:21

Reads from the local repository are still single threaded, but if the bottleneck was on the upload then multiple threads can help. I thought it might at least be worth trying since the details of your local backup storage are unclear.

In your case it sounds like the storage is fast enough to bottleneck the reader thread.

knightar · 28 May 2019 21:32

You explained it better then I did … yes, I believe that the read thread is being bottlenecked by the speed of the local drive. The local raid can read at over 1GB/s (I haven’t benchmarked it recently so I do not know the exact numbers atm) and the current speed is 344MB/s (currently just skipping chunks at that speed).

leerspace · 28 May 2019 21:35

This doesn’t address your feature request to use multiple reader threads, but you might be able to speed up VM backups by disabling file hash computation using the DUPLICACY_SKIP_FILE_HASH environment variable.

Reference:

knightar · 28 May 2019 21:43

DUPLICACY_SKIP_FILE_HASH=1 seems to help; it increased the speed from 344MB/s to 550MB/s while still maxing out the CPU to 100%. Any performance improvements I can get would be fantastic … so I’m hoping that mutithreaded read can be implemented sometime in the future.

gchen · 28 May 2019 21:51

Duplicacy uses one thread to read and split files, but the compression/encryption is done by uploading threads, so setting -threads to more than 1 should definitely help.

I would not recommend setting DUPLICACY_SKIP_FILE_HASH to 1, as that would disable file hashes which I think are useful in verifying the integrity of backups.

leerspace · 28 May 2019 22:26

According to this comment it sounds like Vertical Backup does this. Or am I misreading the linked discussion?

knightar · 28 May 2019 22:39

Ok so, putting aside the compression/encryption (I have it set to 10 threads at the moment) the single thread for read/split is my bottleneck … since I am maxing out that single thread I cannot get any faster speeds even tho I am not nearly maxing out my storage read.

gchen · 29 May 2019 00:44

Vertical Backup actually implements a special form of file hashes (which is basically the hash of all chunk hashes). For Duplicacy once you enable DUPLICACY_SKIP_FILE_HASH you’re left without any file hashes.

TheBestPessimist · 29 May 2019 04:29

I might be pedantic here but is the read speed a real bottleneck? (eg. are you trying to find solutions for something which isn’t broken?)

The server could be used for other stuff while it is doing the backups. Backups should be lower prio compared to real work. Going full 100% HDD usage is not a good way to achieve that.

Subsequent backups should be much faster since they will only take the delta from last backup. Even though you will still have the cpu bottleneck, since the backups should be smaller, the backup time should also be smaller overall compared to the initial backup.

knightar · 29 May 2019 04:53

This server was constructed for the sole purpose of a backup server, there is absolutely no other usage on the server besides that which is why I want the best possible thoughput. We use DRBD to make real time replication of the data from the primary server and once a backup starts we pause the replication so we can do backups. Since we would like to have the backup process as fast as possible so we can reconnect DRBD, proformance is the key. This is actually not the first backup I’m looking at. It’s skipping over 90% of the chunks because only about 1gb has changed between the last backup and the current one being added but it’s still hitting 100% CPU at 344MB/s, 550 while DUPLICACY_SKIP_FILE_HASH was set. So clearly the read can go faster if it wasn’t limited to a single read and split thread.

TheBestPessimist · 29 May 2019 06:07

Since the server is solely for backups i understand why speed is all that matters.

I don’t know if can be sped up here though. I remember @gchen saying (somewhere) that the single threaded read/split was pretty important in the backup process.

What you could do to alleviate this problem is instead of having 1 single repository which backs up EVERYTHING, you could create multiple smaller repos (partition the data), and have all of those backups’ run at the same time, to the same storage. (Think the way python’s multiprocessing module works.)

In this way it’s true that the backups will still be limited by the single threaded speed, but having parallel backups will consume more the idle resources.

How many backups and repos? idk! -> that is for you to decide, taking into account the growth that may happen in the future.

TheBestPessimist · 29 May 2019 06:20

totally offtopic: that brook picture makes me giggle

knightar · 29 May 2019 06:48

Glad you get the reference

knightar · 29 May 2019 06:54

Actually that might be a good idea. I currently have 5 repositories, backing all 5 up simultaneously may be a good idea to use additional resources until a possible solution can be thought up. Although thinking about it, if I want to maximize all possible system resources, it may be worth me cutting up the virtual machine images in parts when I am exporting them to the repository and make a repo for each part and then parallel backup each repo…

I’ll have to benchmark my raid again and see if just backing up each VM alone will max out the resources or if splitting is a way to go.

Thanks for the input I’ll let you know how it goes