Storj upload speed?

I’m in the process of migrating to Storj from Wasabi for some of my data. The upload speed to Storj seems pretty slow, at least compared to what I remember from Wasabi. With Storj, my uploads are proceeding at about 4 MB/s as reported by Duplicacy. Is that typical or are other people getting better speeds with Storj? I get similar speeds when manually uploading files to Storj buckets so the slowness isn’t a Duplicacy issue as far as I can tell.

image

With storj you have two options: use native integration, or use their (or your own) S3 gateway.

The former option can achieve absolutely ridiculous speeds, however

  • network must be rock solid
  • most consumer routers won’t do
  • upstream channel needs to be beefy enough to support upload amplification
  • upstream latency shall be low and/or segment sizes shall be as close to 64M as possible.

The latter option will work better on most residential connections: upstream bandwidth will match exactly what’s being uploaded, and all erasure endcoding and distribution to storage nodes will be happening on the S3 gateway and not your machine. It is still worth it to try to make segment size as close to 64MB as possible, both from performance and cost perspectives.

I suggest initialize a test repository on your machine in an empty folder pointing to storj in a both configurations (native and s3) and run duplciacy benchmark with different chunk size parameters.

This shall help pick the best configuration for your circumstances (machine speed, network stability, etc).

On the average chunk size: duplicacy by default uses 4MB average chunk size. It’s arguably way too small, even for other storages that don’t incentivize large chunk sizes. Picking 32 could yield better results. It can be set at the storage initialization stage.

A (very) long thread on the same topic: Completely new: where to start - #27 by saspus

Thanks @saspus. I guess I didn’t think to try using Duplicacy’s S3 mode to write to Storj. I’m using the native integration. I might try S3 mode.

I’ve got a pretty decent network connection though, (symmetric gigabit fiber) so I’m not sure why the native mode is running so slowly.

I was careful to select a larger chunk size following the recommendations on this forum. My average chunk size is right around 33 MB for the data I’ve backed up so far. So, not the 64 MB you recommended, but not the tiny 4 MB default either.

I’ve only been using 1 thread so far because I didn’t want to slam my Synology. But maybe I should try 2 or more threads to see if I can improve the throughput modestly without a horrible perf hit.

1 Like

That sounds adequate.

Please do try more threads, but at the same time monitor CPU utilization. It’s very likely CPU would end up being a bottleneck. (duplicacy needs to shred, compress, and encrypt data; and then storj uplink library needs to erasure encode and encrypt the data again, and send it to multiple targets)

Try the benchmark too, to make the test repeatable and isolate the issue – you can then run the same test on another beefier PC and see if you get any difference in performance.

Also measure actual network utilization to confirm numbers duplicacy displays (e…g on a gateway, or Synology Resource Monitor, or on your pc when running test from there)

I set the thread count to 4 which increased the CPU load on the Synology to around 40%. I’m not sure I want to go any higher than that TBH. But it does seem like it modestly increased the throughput. Duplicacy is reporting closer to 8 MB/s now. Interestingly, the Synology Resource Monitor was reporting about 10 MB/s upload before and about 28 MB/s now.

The lack of throughput is mostly a concern while I get through the initial upload phase to Storj. Once that’s done and it’s only doing incremental uploads then I won’t be too concerned if the throughput is lower. I might even drop back to a single thread at that point. For the record, I’m only doing Duplicacy copy operations to Storj. The backup tasks are running to and from local storage.

The default expansion factor I think is about 2.7 (80/29), so resource monitor vs duplicacy reported upstream utilization does sound reasonable.

You can try S3 endpoint then – it may or may not be faster. But it will definitely use less resources.