Duplicacy being impossibly slow when using custom chunk size

editor1.shell · 3 January 2022 14:56

I am trying to backup about 100TB of large video files and project files (a combination of large 15GB+/file files and thousands of small 1-100MB project files and cache).

When I initially set up storage via web GUI, after backing up about 50TB it resulted in 8+ mil chunks and caused the NAS I was backing up to, to become unstable when I had to perform some troubleshooting later on.

So I decided to start fresh, but this time with larger chunk size, in hopes of improving performance

cd \NAS1\test1

C:\Users\user.duplicacy-web\bin\duplicacy_win_x64_2.7.2.exe init Test1 \NAS2\Duplicacy\test1 -c 32 -min 8 -max 128 -storage-name test1-storage

After the above, I went into the web gui and added the newly initialised storage in the web GUI.

However, when I run a backup to this new storage the speed is about 6kb/s - it used to be about 50-100MB/s when using default settings!

Both source and destination are x12 HDD RAID6 pools with 10GBE connectivity all the way through. The issue occurs only when I try any chunk size values larger than the default one.

Am I doing something wrong where ? Any Ideas ?

gchen · 3 January 2022 21:18

These sizes are in bytes. You’ll need to add m to them:

C:\Users\user.duplicacy-web\bin\duplicacy_win_x64_2.7.2.exe init Test1 \NAS2\Duplicacy\test1 -c 32m -min 8m -max 128m -storage-name test1-storage

editor1.shell · 3 January 2022 21:41

ay, it just had to be something simple hadn’t it

Thank you for pointing out.

Droolio · 4 January 2022 11:49

Have to ask - why are you using Duplicacy to backup such large media files? Duplicacy packs, compresses and encrypts files into chunks - making them inaccessible until restore.

I understand it’d be useful to have a version history of project files etc., but if there’s any way to separate out the file structure (or write filters) such that Rclone would handle the multi-GB video, and Duplicacy the project files, you’d probably have an easier time when it comes to pruning and checking.

From experience, having millions of chunks in a storage seriously impacts the time it takes to complete those operations. (Would be nice if Duplicacy could parallelise ListAllFiles() for local/ssh as it does for GCD, but it still wouldn’t make those videos easily accessible compared to an Rclone mount.)

editor1.shell · 4 January 2022 12:28

I look after 15 different asset management services at work. Don’t want to complicate my life any further. When it works, Duplicacy does provides a very efficient and conventient way to manage backups.

It just so happens that our needs are a a bit extreme. We’re a small team operating with very large data sets, the allocation of storage per person is very high (as opposed to a typical corporate office where many employees will have relatively small allocation each. I.E 100/person X 1000 employees vs 20TB/person X 5 employees), Enterprise solutions like Veeam gear their pricing based on an estimation that if a company holds 100TB of data they must have thousands of employees and therefore can afford to fork a lot of cash for a backup solution. However, our small team, it would be astronomically expensive.

I tried Rclone at first, but without proper snapshoting, recalling projects becomes nearly impossible.