Adjust chunk size / investigate performance via Web UI

ephekt · 3 December 2019 07:18

Hi,

I’m giving Duplicacy a trial run for the next 3-4 weeks. Been on Arq for a while but as I build out a NAS, I need NAS support and to run within Linux. Duplicacy has a lot of features that seem great on the surface.

While this may entirely be Google Drive, I’m seeing Duplicacy struggle past 2-3MB/s while Arq seems to roll 10-15MB/s, all things equal, same machine, same drives being backed up, etc. I’m on gbit fiber and have not experienced these speeds before. Again, it could be Drive so I will see how things go over the coming weeks.

But it got me thinking, and looking at logs, at how small the chunk sizes are. Can I increase those to 10 or 20MB via Web UI? They’re quite small and I’m not at all concerned with dedupe. I know my files are almost all unique (big media/video files).

Thanks!

gchen · 3 December 2019 18:30

You can’t set the chunk size via Web UI. You’ll have to use the CLI to initialize the storage and pass the -chunk-size option.

However, I don’t think increasing the chunk size would improve the performance. Did you try adding a -threads 4 option to the backup job? The other thing is to look at the log message to see if there are any rate limiting log messages.

ephekt · 3 December 2019 18:47

The logs do not show anything errors but I also can’t seem to tail them properly. I just see a spinner on the URL bar and the page never loads (when I click the progress bar of a backup upload). I am not backing up much, either, around 30GB for this test run and I know GDrive has a 750GB/Day upload limit and some other nuances but they don’t appear to be affecting this.

I will try to up the threads, I had not tried that. I will let you know!

34|690x479

ephekt · 3 December 2019 19:08

Alright, the extra threads made things better. I set to 6 threads and I’m now getting closer to 14MB/s. That is OK for now. If I’m on a 6c/12+ thread CPU, can I move things further up? I have around 20TB of data…

ephekt · 3 December 2019 19:11

What is the default chunk size and how would I increase it? I’d prefer closer to 10MB chunk sizes personally to get the better sustained transfer speeds.

leerspace · 4 December 2019 02:08

You cannot change the chunk size on an existing storage. The chunk size parameters are set when the storage is initialized – which can be done with the CLI if you don’t want to use the default parameters.

By default, uses a variable chunk size algorithm with an average chunk size of 4 MB, max chunk size of 16 MB (4 * average), and min chunk size of 1 MB (average / 4).

You can tell to use a fixed chunk size for a storage (see further reading below), but keep in mind that this also disables the pack-and-split method for chunking; which means that files smaller than the chunk size will be uploaded in their own chunks (which could be significant if you have a lot of small files) instead of being packed together.

Further reading:

ephekt · 4 December 2019 02:28

Thank you. So I can delete my storage in the Web UI and then make it via the CLI. Will it show up in the Web UI once I have made it via CLI? My files average around 5-10GB per file. I have less than 10 (no joke) smaller than 100MB.

leerspace · 4 December 2019 03:37

I just realized that the average chunk size has to be a power of 2 (not sure exactly why), so 10M exactly doesn’t work. So your closest options to 10 MiB are 8 MiB, then 16 MiB.

Removing the storage from the web UI doesn’t delete anything from the actual storage location. If you want to just start fresh and delete your existing storage, you could

Remove the storage from the web UI
Delete all of the files in the existing storage directory on Google Drive (i.e., chunks directory, config file, snapshots directory)
Initialize the storage using the CLI; e.g., for 16 MiB chunks, duplicacy init -c 16M -max 16M -min 16 ...
In the web UI, add the same storage you created in #3

TheBestPessimist · 4 December 2019 06:03

Go as many threads as you need to saturate the network * as long as * you don’t get rate limited by your storage provider. Try it and see what’s the best solution for you.

The number of threads is the number of uploading threads, so all they will consume is more memory, but not more CPU.

Droolio · 4 December 2019 14:42

Not to dissuade you from employing Duplicacy in your backup strategy, but are you sure this is the right tool to protect large media files? Such data won’t de-duplicate, won’t need a revision history, and may be a lot of unnecessary overhead compared to a simpler tool like Rclone.

(I personally use Rclone for large media, and Duplicacy for everything else.)

If you decide to stick with Duplicacy, maybe fixed size chunks for media would be more efficient…

ephekt · 4 December 2019 17:14

I tested with 14 and boom, over 100MB/s. Happy kid now.

ephekt · 4 December 2019 17:18

Thank you for chiming in. Means a lot that you spent a moment to share your thoughts!

I, now and then, replace a file with the same name (upgrade quality, etc) and for this very small but not that uncommon reason, version history is nice. That being said, the biggest reason I am looking at Duplicacy/Arq (parallel testing both) is that I’d like a simple GUI for backup/restore, scheduling, etc.

Simplicity. I will have another look at rclone but I just checked and still no web ui (their React-based UI looks pretty bad still / underdeveloped).

ephekt · 28 December 2019 17:56

Thanks. Coming back to this, can I re-init a repo to have a higher chunk size? And then re-add it back to the UI?

leerspace · 28 December 2019 20:12

Yes; one option is deleting the contents of the old storage and initializing the storage again, but with your desired chunk size. Another is leaving your old storage as is for historical reasons and initializing the new storage with the desired chunk size in a new bucket or directory.

ephekt · 29 December 2019 02:36

Thank you. Unfortunately, I installed Duplicacy via the docker for Unraid and now realize it does not come with the Duplicacy binary, just this duplicacy_web binary to start the web server. Will need to dig around to see where they put the bin for duplicacy. I think it’s sitting in a bin folder titled ‘duplicacy_linux_x64_2.3.0’ – will explore and report back.

madison437 · 7 July 2021 10:47

Hi, did you ever decide on an ideal chunk size / number of threads for large media files when used with Google Drive?

Any suggestions appreciated.