Completely new: where to start

I’ve tried what you proposed, used duplicacy benchmark -chunk-size 64 -chunk-count 10 -upload-threads 20 -upload-threads 20 to create the 640Mb file, interrupted and then downloaded it with rclone.
I tried to look for --transfers in rclone documentation but could not completely determine its purpose, from what I’ve read I assume that it works similar to duplicacy threads.

The result obtained is the one that follows:

PS D:\Documentos> .\rclone.exe copy -P --transfers 20 storj:backup D:\rclonetry\
Transferred:          640 MiB / 640 MiB, 100%, 76.873 MiB/s, ETA 0s
Transferred:           10 / 10, 100%
Elapsed time:         8.5s

Download speed seems to be improved a lot, which I think would point to duplicacy not being working properly for some reason.

Whoa! This is an extremely interesting result!

In my tests duplicacy was much faster, so that I did not include the rclone results in the post. (But only tested on 4 threads, and egress from Amazon is quite expensive, I’ve already spent $20 :)).

It would be interesting to understand what does duplicacy do differently than the rclone.

And to confirm, you have created native storj remote in recline, not S3, correct?

I’m wondering when you specified 20 transfers with rclone and you only have 10 files it started downloaded each file in two threads - from the beginning and from the middle — or did it simply use 10 threads.

Either way, this is a massive improvement.

But I think we are very close. It is possible that the rclone is using newer version of uplink (storj api library) that has a bunch of optimizations while duplicacy is using an older and slower one.

Actually, rclone can be used to serve any remote as few others remotes, rclone serve

You could technically use rclone to serve STORJ via its native remove over SFTP and have duplicacy connect to local rclone instance via SFTP. I’m not saying that this is what you should do permanently, just pointing out a possibility.

Actually it would be a good experiment. If duplicacy → rclone serve → storj is fast but duplicacy → storj is slow ==> it’s definitely old storj library in duplicacy.

It’s hard to imagine anything else. Ultimately, both apps just fetch files from the interwebs.

Alternatively, i can try updating the storj dependency in duplicacy source and building a binary for windows for you, but you should not trust binaries from random people on the internet, so maybe it’s a non-starter.

1 Like

Send me a direct message with your PayPal or similar, I’d be more than happy to cover the cost. Your assistance has been incredibly valuable, and it’s only fair that I contribute.

The only other possibility I can imagine is that at some point I’ve set some config that is limiting duplicacy performance, but cannot think of anything. Specially taking into account the first results obtained with STORJ:

That is correct.

I actually did not completely understood the use of --transfer so I’m a bit lost here.

But wouldn’t you have obtained also worse results with duplicacy than with rclone? Isn’t a way to confirm that? I’m unsure if maybe in the “patch notes” they refer to which library are they using.

This is above my current knowledge, would have to read a bit and come back. I guess that It would means that duplicacy splits and encrypt the data, and then rclone uploads it to STORJ, then rclone would download it and duplicacy does the opposite process. How would it affect versioning? Would it still be possible? Any way, seems a bit of a complex solution.

To be honest, I simply have no clue of what would that means :rofl:

It is simply that I’m not in any hurry, in the end I’ve doing the wrong thing for years and (luckily) so far I haven’t lost any data. I rather wait and understand what I’m doing (at least a bit) and have a setup that works properly.

1 Like

No worries. I derive much pleasure from tinkering with this. I could have went to the movies instead and paid more :slight_smile:

I don’t think there is any hidden config that would affect performance, and indeed, you earlier result is much better. If nothing else changed – maybe peformace of your ISP is inconsistent and all the issues are red herrings, or if that was visa multi hop VPN – maybe routing happened to be favorable.

After reading the documentation, --transfers is how many files to transfer in parallel. So if you only have 10 files and set 20 as the parameter – it will transfer all files in parallel, i.e. 10

For in-file parallel transfer there is indeed a different parameter. We don’t need that, our filreas are already small enough.

So, duplicacy and rclone must get the same results.

I’m going to download stuff with rclone from my home connection tonight varying number of threads and see how it compares with what dulpicacy benchmark reports.

Good point. I’ll try to retry the test, maybe I’ve screwed somethign up.

Nothing changes from duplicacy perspective, just transport to the destination takes two hops. Duplicacy thinks it backs up to sftp, rclone pretends to be sftp server but in reality uploads and downloads files from storj. But I agree, it’s a bit overkill. Actually, I’ll try that tonight as well. Duplicacy benchmark directly and over reclone. I expect exactly same numbers. But we’ll see.

1 Like

Yep. Of course I did. I forgot to specify chunk sizes when uploading and these were 4MB ones. Dangit.

1 Like

Rclone, downloading 10 64MB Duplicacy chunks from storj, using latest rclone:

rclone, storj 1 2 4 10
Down, MB/s 17.1 25.6 52.4 64.0
cpu, % 75% 118% 236% 291%

Looks like with rclone 10 threads we get to some kind saturation – either my internet or CPU – I have 4 CPU cores on that machine, and it’s probably busy with other stuff like smb. Download reached 64MB/sec.

rclone download logs, 10 threads
alex@truenas ~/tests-duplicacy-storj/rclone-v1.64.0-freebsd-amd64 % rm -rf /tmp/rclone && mkdir /tmp/rclone && time ./rclone copy -P --transfers 1 storj:duplicacy /tmp/rclone
Transferred:   	      640 MiB / 640 MiB, 100%, 17.060 MiB/s, ETA 0s
Transferred:           10 / 10, 100%
Elapsed time:        40.0s
./rclone copy -P --transfers 1 storj:duplicacy /tmp/rclone  24.90s user 5.37s system 75% cpu 40.211 total
alex@truenas ~/tests-duplicacy-storj/rclone-v1.64.0-freebsd-amd64 % rm -rf /tmp/rclone && mkdir /tmp/rclone && time ./rclone copy -P --transfers 2 storj:duplicacy /tmp/rclone
Transferred:   	      640 MiB / 640 MiB, 100%, 25.638 MiB/s, ETA 0s
Transferred:           10 / 10, 100%
Elapsed time:        24.7s
./rclone copy -P --transfers 2 storj:duplicacy /tmp/rclone  24.27s user 5.14s system 118% cpu 24.844 total
alex@truenas ~/tests-duplicacy-storj/rclone-v1.64.0-freebsd-amd64 % rm -rf /tmp/rclone && mkdir /tmp/rclone && time ./rclone copy -P --transfers 4 storj:duplicacy /tmp/rclone
Transferred:   	      640 MiB / 640 MiB, 100%, 52.416 MiB/s, ETA 0s
Transferred:           10 / 10, 100%
Elapsed time:        12.6s
./rclone copy -P --transfers 4 storj:duplicacy /tmp/rclone  25.79s user 4.45s system 236% cpu 12.776 total
alex@truenas ~/tests-duplicacy-storj/rclone-v1.64.0-freebsd-amd64 % rm -rf /tmp/rclone && mkdir /tmp/rclone && time ./rclone copy -P --transfers 10 storj:duplicacy /tmp/rclone
Transferred:   	      640 MiB / 640 MiB, 100%, 63.899 MiB/s, ETA 0s
Transferred:           10 / 10, 100%
Elapsed time:        10.5s
./rclone copy -P --transfers 10 storj:duplicacy /tmp/rclone  27.42s user 3.65s system 291% cpu 10.654 total

I’ve re-run duplicacy benchmark with 10 threads, and got 46.6M/s download performance

duplicacy benchmark logs, 10 threads
alex@truenas ~/tests-duplicacy-storj % ./duplicacy benchmark -chunk-count 10 -chunk-size 64 -download-threads 10 -upload-threads 10 -storage storj
Storage set to storj://12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us1.storj.io:7777/duplicacy/duplicacy
Generating 256.00M byte random data in memory
Writing random data to local disk
Wrote 256.00M bytes in 0.04s: 6408.22M/s
Reading the random data from local disk
Read 256.00M bytes in 0.02s: 10416.08M/s
Split 256.00M bytes into 4 chunks without compression/encryption in 17.09s: 14.98M/s
Split 256.00M bytes into 4 chunks with compression but without encryption in 17.47s: 14.65M/s
Split 256.00M bytes into 4 chunks with compression and encryption in 17.88s: 14.32M/s
Deleting 10 temporary files from previous benchmark runs
Generating 10 chunks
Uploaded 640.00M bytes in 793.42s: 826K/s
Downloaded 640.00M bytes in 13.73s: 46.63M/s
Deleted 10 temporary files from the storage

That’s a pretty big discrepancy right there.

I’m going to try to rebuild duplicacy with updated storj library and retry. Duplicacy uses 1.9.0 and current one is 1.12.0. The changelog does mention the word “performance”. I’ll just rebuild and retry.

1 Like
Rebuilt, with storj/uplink 1.12.0
git clone https://github.com/gilbertchen/duplicacy && cd duplicacy

# go.mod
# -       storj.io/uplink v1.9.0
# +       storj.io/uplink v1.12.0

go mod tidy
go tool dist list | grep -i free | grep 64
GOOS=freebsd GOARCH=amd64 go build -o duplicacy_storj_1.12.0 duplicacy/duplicacy_main.go

Download performance improved to 59M/s
alex@truenas ~/tests-duplicacy-storj % ~/duplicacy_storj_1.12.0 benchmark -chunk-count 10 -chunk-size 64 -download-threads 10 -upload-threads 10 -storage storj
Storage set to storj://12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S@us1.storj.io:7777/duplicacy/duplicacy
Generating 256.00M byte random data in memory
Writing random data to local disk
Wrote 256.00M bytes in 0.08s: 3190.96M/s
Reading the random data from local disk
Read 256.00M bytes in 0.02s: 10409.92M/s
Split 256.00M bytes into 6 chunks without compression/encryption in 16.83s: 15.21M/s
Split 256.00M bytes into 6 chunks with compression but without encryption in 17.23s: 14.86M/s
Split 256.00M bytes into 6 chunks with compression and encryption in 17.44s: 14.68M/s
Generating 10 chunks
Uploaded 640.00M bytes in 796.58s: 823K/s
Downloaded 640.00M bytes in 10.80s: 59.25M/s
Deleted 10 temporary files from the storage

I’m re-runnign with stock and rebuilt duplicacy few more times to ensure it’s not a fluke. This will have to wait a few hours, as there is quite an activity in my network.

1 Like

@saspus is once again a hero to the community trying to help us solve these issues. Following with interest as it seems related to my storj problems, too (linked in #15).

I’ve setup to run the series benchmark alternating stock duplicacy with the one with updated storj, and alternating their sequences, with slightly more data (size 64, count 32) to make download longer, and reduce number of threads to 4 to eliminate any saturation. 10 runs total.

I did not get as clear a picture. The download performance was all over the place, between 38 and 60 (yes, on just 4 threads, which is interesting):

  • the updated version showed values anywhere between 38 and 60,
  • and stock version from 38 to 45.

It seems it could have been an improvement, but could also have been a random chance due to changing network conditions

1 Like

I’ve shared the binaries with updated storj library here, if you want to try:

After moving from one part of the Bay Area that supported AT&T fiber symmetric to one that only offered Comcast, I’m stuck with the same.

I see that you wrote this is the maximum available for you. I was also on 1000/30 until I saw that they offered a 1200 tier. I initially considered this as overkill for my uses, but what wasn’t immediately clear was that it also came with a an upgraded 40 mbps upstream. I figured you already checked this but just in case…

Thank you. They have actually silently upgraded my connection to 40Mbps upstream literally yesterday, without telling anyone, after few nights of outages. Apparently, they are upgrading the equipment, and soon it would be possible to get up to 200Mbps upstream, with a new modem. I’m looking forward to that :slight_smile:

1 Like

So it wasn’t possible to get a conclusive answer, I can try running a test with the updated version you posted.
It would be enough with running the .exe as with regular duplicacy? Something has to be done with go.mod?

Glad to hear that!

Yes, you can just run .exe. I’ve uploaded .mod just for reference of what versions of what modules it was built with.

1 Like

I’ve just repeated the benchmark, got 27,79 upload and 70,29 download for THE STOCK VERSION, really don’t know the reason why. Maybe something net related?
Yours reported 27,21 and 79,68, this have been consisted with multiple tries, so it definitely seems to be an improvement!

Download speed is the same as with rclone, I assume the upload one is correct? Shouldn’t both be similar since it is a symmetric connection?

Internet weather :slight_smile: maybe some bottleneck somewhere in the network… who knows.

Oh, this is nice!

@gchen, could you please bump the storj/uplink version in the next duplicacy version? it seems it performs slightly better, and changeless does mention some performance improvement, so it’s worthwhile.

1 Like

I’m not sure I understand – do you mean download matches rclones’ but upload does not?

I worded that poorly, when I measured speeds with rclone, I only measured download speed, which appears to be consistent with what I’m currently experiencing with Duplicacy. Is it normal to have such a significant difference between upload and download speeds when using a symmetric fiber connection?

I might explore how to benchmark upload speed with rclone and give it a try.

Tomorrow, I’ll finally begin the backup process, and I’m hoping it to be a much smoother process :joy:

I want to express my gratitude once again for all the time you’ve taken to assist me. This has been an incredibly interesting and educational experience.

1 Like

Based on this:

Much effort has gone into optimizing this process, for instance when you download a file we attempt to grab 39 pieces when only 29 are required eliminating slow nodes (Long Tail Elimination). This is our base parallelism and allows up to 10 nodes to respond slowly without affecting your download speeds.

When uploading we start with sending 110 erasure-coded pieces per segment in parallel out to the world but stop at 80. This has the same effect as above in eliminating slow nodes (Long Tail Elimination).

I’d expect that upload should be slower — it needs to upload 2.7 times more data

Since with gigabit connection you can upload at maximum 120 MBps, the storj upload will therefore be capped by 44MBps.

If you want to upload faster — you either need bigger upstream or use a gateway — either your own in the cloud or storj provided one. Assuming gateway is capable of maintaining the gigabit upstream from you. The 2.7 multiplication will then happen on a gateway.

With download — you only download minimum amount of chunks necessary to reconstruct the files. While more than minimum required transfers are started, once enough pieces have been downloaded — others are cancelled. There is some overhead associated with clogging your connection with extra transfers that will get cancelled, but the net benefit is positive because you avoid waiting for slow nodes and effectively the fast ones are self-selected.

You are welcome and thank you too for the opportunity to learn something new too: how to cross-compile go programs, how to update dependencies and manage modules, how to start an instance on AWS (I’ve been using Oracle cloud before, AWS is much nicer to deal with. Much less waiting clicking around and much more streamlined and intuitive process. Oracle however provides 10TB of free egress monthly… ).

1 Like

I see, it is a direct consequence of how it is designed.

What I’m thinking about is that we’ve always set chunk size at 64Mb to get optimum performance out of the native protocol. However, I’m not sure if it possible to set an exact chunk size outside of a benchmark tool. As you mentioned before, -c 32 would control the average chunk size, but this leaves a lot of room for variations far from 64. I assume that is not possible to determine an exact chunk size since file sizes would have to always be multiple of 64, I’m thinkin in using -c 64 -max 64 so it tries to create 64 Mb chunks but leaves room to create smaller ones when needed. Does this make any sense?