Restore speed with Duplicacy is quite slow, can that be improved?

TyTro · 5 January 2021 21:29

I’m testing restoring a file for the first time with Duplicacy (Web GUI) now, from a backup on Google Drive. My test is restoring one 100 GB file. The backup storage is using all-default values with regards to chunk size etc. My restore target directory is on a SSD.

I am surprised that it’s actually quite slow. I have a 500 MBit’s connection, but Duplicacy is only using a download bandwidth of 70 MBit’s.

I have tested with different -threads values.

1 thread: Slow and extremely inconsistent, jumping around between 0 and 30 MBit’s, with the average probably being around 7 MBit’s or so.
5 threads: Faster, but still inconsistent. Jumping around between 0 MBit’s and 70 MBit’s, on average probably around 30 MBit’s.
10 threads: Still jumping around between 0 MBit’s and 70 MBit’s, on average probably around 50 MBit’s.
20 threads: somewhat consistent 70 MBit’s
50 threads: somewhat consistent 70 MBit’s

So, it seems no matter how many threads I throw at the problem, the speed won’t go above 70 MBit’s.

For comparison, if I manually download a file from my Google Drive, I get these speeds, depending on software used for downloading it (different software uses different amounts of connections etc):

Firefox: Consistent 130 MBit’s
Google Chrome: Consistent 300 MBit’s
JDownloader, set to 10 connections per download: consistent 500 MBit’s.

So it’s definitely not a limit of my internet connection, other software that’s designed for quickly downloading files, like JDownloader, can make use of my full 500 MBit’s connection. Can Duplicacy be made faster regarding this? What exactly is the -threads option doing currently? If I set it to 50 threads, does that mean it’s downloading 50 chunks in parallel? Or does it mean it downloads chunks one after another, with 50 threads per chunk?

My system:
Windows 10
Ryzen 3950X (16 Cores)
128 GB RAM @3600 Mhz

gchen · 6 January 2021 17:30

It could be that Google Drive is rate limiting your downloads. If you add -d as the global option you’ll see those rate limiting messages in the log.

You can also try to run the CLI benchmark command to figure out what is the bottleneck. This is how you run the CLI in a DOS window:

cd C:\Users\username\.duplicacy-web\repositories\localhost\all
C:\Users\username\.duplicacy-web\bin\duplicacy_web_win_x64_2.7.2 benchmark -threads 4

saspus · 6 January 2021 19:25

Can it be related to the rate limits of the shared google project credentials are issued from?

@TyTro, can you try creating your own google project to issue credentials for duplicacy – as described here Supported storage backends

TyTro · 7 January 2021 06:06

Where can I add that -d option? If I just add it to the “options” box in the “Restore” menu, so “-d -threads 20”, then Duplicacy complains that -d is not a valid option. But I don’t see any other text box for entering a “global” option?

Where does the benchmark tell me what’s the bottleneck? I have often run the benchmark by now, but it has never shown me where I can find any bottleneck, it’s always just uploads/downloads being slow, and everything else being fast.

C:\ProgramData\.duplicacy-web\repositories\localhost\all>C:\ProgramData\.duplicacy-web\bin\duplicacy_win_x64_2.7.2.exe benchmark -storage BackupGoogle -upload-threads 5 -download-threads 5
Storage set to gcd://Duplicacy
Generating 256.00M byte random data in memory
Writing random data to local disk
Wrote 256.00M bytes in 0.21s: 1246.55M/s
Reading the random data from local disk
Read 256.00M bytes in 0.04s: 6912.29M/s
Split 256.00M bytes into 52 chunks without compression/encryption in 1.07s: 238.66M/s
Split 256.00M bytes into 52 chunks with compression but without encryption in 1.41s: 181.16M/s
Split 256.00M bytes into 52 chunks with compression and encryption in 1.43s: 178.45M/s
Generating 64 chunks
Uploaded 256.00M bytes in 114.91s: 2.23M/s
Downloaded 256.00M bytes in 69.72s: 3.67M/s
Deleted 64 temporary files from the storage

C:\ProgramData\.duplicacy-web\repositories\localhost\all>C:\ProgramData\.duplicacy-web\bin\duplicacy_win_x64_2.7.2.exe benchmark -storage BackupGoogle -upload-threads 10 -download-threads 10
Storage set to gcd://Duplicacy
Generating 256.00M byte random data in memory
Writing random data to local disk
Wrote 256.00M bytes in 0.21s: 1227.41M/s
Reading the random data from local disk
Read 256.00M bytes in 0.04s: 7099.17M/s
Split 256.00M bytes into 52 chunks without compression/encryption in 1.08s: 236.01M/s
Split 256.00M bytes into 52 chunks with compression but without encryption in 1.41s: 181.28M/s
Split 256.00M bytes into 52 chunks with compression and encryption in 1.44s: 177.50M/s
Generating 64 chunks
Uploaded 256.00M bytes in 80.52s: 3.18M/s
Downloaded 256.00M bytes in 61.04s: 4.19M/s
Deleted 64 temporary files from the storage

@saspus I assume you mean this?

I don’t understand how that works. If I follow that link, I get to things related to “Google Cloud Platform”, which is not “Google Drive”. I see a list of my “Google Cloud Platform” projects in there, none of “Google Drive” related things.

saspus · 7 January 2021 06:26

That’s ridiculously slow for 10 threads. I’ve just did test restore using service account account with 10 threads and it completely saturated my downstream which is 50Mbps downloading at about 5MBps (there was other people using internet at the same time).

Downloaded chunk 49 size 2788322, 5.12MB/s 00:00:07 86.5%
Downloaded chunk 50 size 6713021, 5.14MB/s 00:00:06 89.0%
Downloaded chunk 53 size 4390016, 5.12MB/s 00:00:05 90.6%
Downloaded chunk 54 size 8618448, 5.08MB/s 00:00:04 93.8%
Downloaded chunk 52 size 10131913, 5.18MB/s 00:00:02 97.6%
Downloaded chunk 48 size 1852315, 5.21MB/s 00:00:01 98.3%
Downloaded chunk 51 size 4284032, 5.30MB/s 00:00:01 100.0%
Downloaded alex/Pictures/Photos Library.photoslibrary/resources/renders/8/805F8DCA-7FED-421C-A198-3E5559BC94D4_2_0_a.mov (256983567)
Restored /Users/alex/TestRestore to revision 686

Granted there could be some other differences like locality (I’m in north California) but that would not explain your connection saturating at 3-4MBps, especially with google’s ubiquity. How is your speed test to other hosts?

If your speed test is good – and downloading data using google drive or GFS is fast – I would definitely try service account.

The idea being you create your own “project”, e.g. “My Duplicacy Project”, create service account in it, and download json file to use with duplicacy. Service account has its own email address/ID. You would share your google drive duplicacy folder with that “user”. That way duplicacy using that service account will be able to write to you google drive home (you would probably need to adjust the path to the folder).

If you had G-Suite or Workspace I wrote a detailed walkthrough how to accomplish that here: you may leverage pieces related to service account from there: Duplicacy backup to Google Drive with Service Account | Trinkets, Odds, and Ends

TyTro · 7 January 2021 07:10

Good to hear that you agree the speeds I am getting are too slow, and good to hear it seems to be faster for you. That makes me optimistic that at some point I’ll get it to work with faster speeds too.

Thanks! I’ve tried following all the stops from that guide you wrote now, but at the end there, it mentions I’d have to compile duplicacy myself, and I don’t know how to do that (on Windows, your tutorial only mentions how to do it on Mac). Is it even possible to compile Duplicacy GUI yourself? Why has that PR not been merged yet, if its a PR by gchen anyways? What is blocking it?

saspus · 7 January 2021 07:41

If you only use it with a service account and share your Google drive with that service account you don’t need to compile it. It’s only needed for now to let service account to write to domain users’s drive without user expressly granting them permission.

TyTro · 7 January 2021 08:46

Ok, can you tell me how I can change my current existing Duplicacy backup config from using the shared google project credentials to using the service account I configured? Your guide only mentions how to do it when setting up a new backup, but I want to use my existing one of course, just with that different way of accessing the files.

saspus · 7 January 2021 08:54

I have never tried doing that (backup using sharing to service account) — but I will now and see how it goes — and will summarize it to you. I’m also curious.

saspus · 7 January 2021 09:24

BTW - are you using consumer google account or part of Google Workspace/G-Suite? in case there would be difference in configuration…

TyTro · 7 January 2021 09:40

I am using Google Workspace currently, the Enterprise plan for $20 a month with unlimited storage.

saspus · 7 January 2021 10:36

Hmm. I could not make it work. I found similar thread GCD Using Own Credentials without definitive answer.

Basically, when I share folder with another google account (be that service account or normal user account) that other account cannot enumerate shared folder – neither with duplicacy nor rclone (which explicitly supports that scenario and even has rclone ls --drive-shared-with-me mygoogledrive: syntax to filter shared folders. Not sure what’s the deal there.

saspus · 7 January 2021 11:10

Perhaps it will be easier for you to build it on windows and do the rest according to that my post – it would be more straightforward anyway – especially since it’s unclear when will that PR make it to the release.

I’ve just tried to build on windows. It’s fairly easy:

download and install git: Git - Downloads (Next, Next, Next, aggree to all by default)
download and install go: Downloads - The Go Programming Language
Open CMD and run go get github.com/gilbertchen/duplicacy/duplicacy. It will take a while and download about 1.5GB of data.
go to the source folder: cd %GOPATH%\src\github.com\gilbertchen\duplicacy.
apply patch: curl https://blog.arrogantrabbit.com/assets/duplicacy_gcd_subject_scope.patch | "C:\Program Files\Git\usr\bin\patch.exe" -p1
To see what has been patched in run git diff. k/j – up/down, q – exit viewer.
build: go install github.com/gilbertchen/duplicacy/duplicacy
pick the binary from C:\Users\me\go\bin\duplicacy.exe and use it instead of what is shipped.

Then you can properly impersonate the user with the service account and it is working well.

I’ll probably update the blogpost with windows instructions… apparently some people still use that OS

TyTro · 7 January 2021 11:36

Thanks, but in your post you only describe how to make it work when setting up a new backup. I want to keep using the backup I already have on Google Drive (which I have in gcd://Duplicacy). And currently my Dupliacy is configured to use that shared project thing to access Google Drive. So I’d somehow have to remove the current shared project access token from Duplicacy, and replace it with the new way explained in your post. How to do that?

Also, doesn’t that only build Duplicacy CLI? I am using Duplicacy Web GUI, not the CLI. Is it possible to build the GUI myself with that PR added?

saspus · 7 January 2021 11:42

You would just replace the reference to json file that you now use with with reference to the new one modified according to the article in the duplicacy.json file. I think the CLI preference files are generated based on that json file on every backup run.

You don’t need to / can’t build GUI. It simply downloads and executes the CLI version to do actual backup. You would rename the duplicacy.exe you built to whatever current CLI executable Web GUI is using (likely duplicacy_win_x64_2.7.2.exe) and replace the current one (it will be in the bin folder under .duplicacy_web folder, next to duplicacy.json); and maybe even fix the version in the settings so that it does not overwrite it once the new one is released)

TyTro · 7 January 2021 11:48

Ah nice, that makes sense.

But where would I do that? Where is the reference to the json file? There is no reference to the json file in duplicacy.json. In duplicacy.json, there is a “gcs_token” and a “password”. And that would need to be replaced with something else I assume?

saspus · 7 January 2021 11:50

this refers to the json file that you downloaded before, no? or is the token inserted there in its entirety?

TyTro · 7 January 2021 11:51

The “gcd_token” is a long string of numbers and letters. Not a reference to any file.

saspus · 7 January 2021 11:56

Ok. Maybe it’s base64 encoded entire token. Not sure.

You can add another backup target, this time using the new token json file, and then edit duplicacy.json to replace the value of gcd_token in the old backup target with that of the new one and then delete the new backup target.

gchen · 7 January 2021 16:47

gcd_token in duplicacy.json is the encrypted path of the token file, so you can’t just manually edit it there.

The easiest way is to delete the old storage and then create a new one with the same name using the new token file. This should not affect the existing backups.