Diabolical download speeds off OneDrive using Duplicacy

rjcorless · 19 August 2022 09:26

I am a new licensee! Yay! Currently running Duplicacy in an Unraid docker.

My internet rated speed is nominally 140Mb/s Down and 28Mb/s Up (Mbits per sec)

As a new user (to both) I wanted to test a backup and restore process just using one or two files and comparing to uploading/downloading to OneDrive using their WebUi.

Both Duplicacy and the the WerbUi will upload a 842MB compressed file @ ~ 2.7MB/s (MBytes per second) with Duplicacy being a tad faster than the browser upload.

However on download, whereas the UI is able to download the file @ ~ 13MB/s Duplicacy downloads at a speed less than it can upload, namely downloads about 2MB/s. This would make a lengthy restore unworkable.

Is there any other testing/settings I can try that will improve this? Currently running the default 4 threads.

Any help would be welcome.
Thanks

saspus · 19 August 2022 14:30

Don’t use more than 2 threads with OneDrive. Ideally — stick to one.

rjcorless · 19 August 2022 15:53

I tried a single thread and it made no difference to the upload speed @ 2.7MB/s and a marginal increase on restore speed @ 2.3MB/s. That’s just unusable for any plan to be able to quickly restore stuff, especially given I have a 900Mb fibre upgrade in two weeks!

saspus · 19 August 2022 16:21

On one hand, 28Mb/s is about 3.5MBps, and accounting for API latency overhead, you get about the correct speed.

On the other hand, OneDrive is not designed for bulk storage: it’s a file sync and collaboration service, and as long as OneDrive client works (and it does not need to be fast) Microsoft would not be fixing any potential issues that help customers abuse the platform. You will never get any good performance from it, and if you try (e.g. with multiple threads) they will throttle you up to a ban. So to use OneDrive as a target in the first place, you should not be using more than 2 threads. Ideally, 1. Asking for good performance is too much.

To verify, connect to one drive with the same credentials using CyberDuck or Transmit, and try to transfer a collection of 1-10MB files. See what kind of performance you get.

If you want good restore performance – avoid file sharing services (Dropbox, OneDrive, etc). Use B2, or S3, or Azure, where you can crank up thread count and get unbounded throughput.

Relevant comment here: (Newbie) Backup entire machine? - #6 by saspus

sevimo · 19 August 2022 16:56

With 28mbit upload you’d be fine using 4 threads, likely to be better than 2 or 1 over the long term. You’d be throttled once in a while, but most of the time you’d be maxing out your upload bandwidth (I certainly can cap out 20mbit upload with OneDrive most of the time). This might be different for 900mbit upload usecases, you may want to see how much you’re being throttled and potentially limit your bandwidth usage and/or threads. Throttling will also be impacted by any other OneDrive usage you may have in parallel (e.g. syncing with OD client, direct web client operations, rclone etc).

You may want to play with duplicacy benchmark command to see where your bottlenecks are.

rjcorless · 19 August 2022 18:03

Thanks for that. I am beginning to suspect throttling with OD. I don’t have any client syncing with OD as I am using one of the ‘extra’ family subscriptions to try it.

As another test I used my Amazon AWS account to create a ‘free’ S3 repository and configured that. This connection maxed out my upload bandwidth and managed 8.5 MB/s restore speed, so nearly half of my max download.

This was with 4 threads (forgot yo change it), so I changed the restore options to -threads 2 . and the restore was a bit slower at 6 MB/s.

But 8.5 MB/s restore is far better than the 2.3 OneDrive was struggling to deliver. Just wish it could max out my bandwidth!

sevimo · 19 August 2022 18:27

Cheap, fast and high capacity - you usually can’t have all three at the same time. OneDrive can hit cheap and high capacity, but it ain’t fast. For my backup use case, this is perfect. If you want fast and high capacity, it ain’t going to be cheap.

saspus · 19 August 2022 18:35

There is also consideration of reliability. OneDrive uses Azure storage, so why not eliminate another layer of complexity that can introduce new issues and use Azure directly?

Same deal with GoogleDrive (albeit it’s much more stable) → Google Cloud Storage, and Amazon Drive (when it still worked) → AWS S3.

Penny pinching with the storage service that your disaster recovery plan relies on is hardly worth it. Storage is cheap these days, no reason to take on extra risk.

sevimo · 19 August 2022 19:33

LOL, when Azure will provide unlimited storage and unmetered bandwidth for 50$/month, I am sure I’ll switch. Until then, I don’t think so.

saspus · 19 August 2022 20:01

Ok, but storage and bandwidth cost money to provide and anything “unlimited for fixed price” can not be a sustainable business model. Look at what happened to those who tried. (CrashPlan, or Backblaze personal for that matter, or even Amazon Drive)

Essentially you are looking for a way for others to subsidize your use. You may be able to (keep) find(ing) that short term here and there, accept throttling and other drawbacks, but ultimately, it’s not a reliable way going forward.

I, on the contrary, prefer “pay for what you use” model, I actually do like that amazon charges me per transaction, per byte uploaded and downloaded and stored based on tiers: because I pay for it, and therefore can rely on it, it’s an honest and sustainable business model, I don’t depend on other users, not have to look for another loophole tomorrow.

sevimo · 19 August 2022 20:44

You can have successful all-inclusive models. and unsuccessful pay-as-you-go models, there is no direct dependency there. All-you-can-eat restaurants? Can be sustainable. Unlimited bandwidth internet connection? Can be sustainable. Unlimited minutes mobile plans? Can be sustainable.

My internet provider provided unlimited bandwidth plans for many, many years, and there is no indication they are going out of business any time soon. In fact, unlimited bandwidth for end users is more of a norm than the exception nowadays. Are heavy users being subsidized by light users? Sure, just like people buying things on sale are being subsidized by people who pay full price.

And if you think just because you’re paying a-la-carte your services cannot be terminated or substantially changed, well… let’s just say there is no universal law that prevents it from happening. You may want to check out Nirvanix story, it’s dated but still quite relevant. Some clients had to extract petabyte+ datasets on a two week notice

I personally treat all storage as unreliable, and public cloud is no exception. As far as I am concerned, any single storage can disappear in its entirety at any particular moment for whatever reason, and my overall infrastructure should be resilient to such events.

saspus · 19 August 2022 22:07

I’m looking at this from a value to a consumer perspective, not service providers’ ability to make profit. And I’m not saying that pay-as-you-go model guarantees service immortality. All the above you mentioned are examples of horrible, to the consumer, models, for two main reasons:

These models unfairly penalize many light users, to subsidize few heavy abusers.
Service provider incentives are not aligned with that of the customer: service provider is incentivized to prevent you from using resources you paid fixed price for – because the less/slower you use – the more they earn. In contrast, in pay-for-what-you-use approaches – the more/faster you use – the more they earn.

You can wrap it any way you want – but this conflict is fundamental and unavoidable.

rjcorless · 20 August 2022 12:40

I’m beginning to think that Amazon S3 is probably the better option for me. I just want to store a weekly full system backup on a Sunday (~75GB compressed) and daily incremental backups (~ 2GB) and hold two weeks worth. So about 175GB in total which I hope never to have to download for a full system restore. Trouble is is that the backup software I use doesn’t appear to support S3. It supports OneDrive (Personal and Business), Dropbox and Google Drive, but that’s it. Mmmmmm.

May just think about backing up to Unraid and then let Duplicacy back that up to S3. But this means that if my backup on the NAS was unavailable, I would potentially need to download a full weeks worth of backups to something else to restore my system instead of just restoring directly from the Cloud. Never easy and straightforward is it?!

Just wish my testing of the Duplicacy process even with S3 was faster for a potential restore than the 8.5MB/s I experienced. Perhaps a bit of throttling on a ‘free’ 5GB account?

Does the download speed look any faster on a paid-for S3 tier that you’ve experienced?

saspus · 20 August 2022 15:01

There must be a typo there somewhere. So I assume 200GB. Since you only want to store it for a very short time, AWS “S3 Standard - Infrequent Access” tier seems the best fit, which results in 200GB * $0.01/GB/month = $2/month, + api cost. (see Amazon S3 Simple Storage Service Pricing - Amazon Web Services).

This is so weird. What software is this?

You can backup with duplicacy to unraid, and replicate with another instance of duplicacy from unraid to the cloud. Then, if you need to restore, and unraid is gone, you can initialize duplicacy with that cloud destination locally and restore directly. No need to download in full.

How many threads? With s3 you can use 10, 20, 40 etc threads, and you are only limited by your ISP connection. Duplicacy has built-in benchmark. What does it report?

gadget · 20 August 2022 15:31

Unless the daily changes are new compressed music files, images and/or videos, Duplicacy’s deduplication will reduce the total storage requirement quite a bit.

A traditional 7-day round robin schedule of 1 full + 6 incremental backups isn’t needed with Duplicacy. Your very first full backup will likely be less than the estimated ~75GB. Each daily incremental afterwards will likely be less than the estimated ~2GB, and might even shrink over time depending on the type of data.

One of the huge advantages to a chunk-based deduplicating backup tool like Duplicacy is that only the very first run is a full backup. All successive uploads to the same storage destination – while technically “incremental” backups – are effectively full backups because duplicate chunks are reused. Any snapshot, including the first one, can be pruned at any time.

Nope, unfortunately reliable backups rarely are.

I haven’t compared Amazon S3, but it certainly is the case for Google Drive and Microsoft OneDrive (I have business accounts for both at work).

Given that your storage requirements are around 100GB (75GB + 13 * 2GB), as an alternative to OneDrive / Google Drive / S3, consider rsync.net.

Rsync.net offers a special pricing tier for advanced users willing to pay annually https://rsync.net/products/borg.html – 200GB of data costs $36/yr with no ingress/egress charges and/or bandwidth caps (minimum charge is for 100GB = $18/yr, $0.015/GB thereafter).

I helped a friend set up a backup to rsync.net. No “buckets” or other opaque storage format and no special API, just standard SSH + SFTP access to a Linux account that you can store data on however you want.

saspus · 20 August 2022 17:06

A good and reliable solution must be simple and straightforward. Otherwise, you can’t trust it. If at any point you feel the arrangement becomes too cumbersome – it’s time to stop, re-assess and likely start over.

I’ve looked at it, rsync.net may still has its uses, but it’s not a good fit for backup in general and OP in particular for many reasons:

You have to pay for storage upfront, regardless whether you use it or not, and therefore waste unused space
There is 680GB minimum order.
Cost of their geo-redundand tier is comparable to the most expensive AWS S3 tier
No egress fees: With backup you rarely if ever have to egress, and yet you are indirectly paying for other people’s egress.
No API fees: There is very small amount of calls involved in uploading backup data, and yet you are indirectly paying for other people’s use of the infrastructure.

They provide an interesting solution with interesting features, but those features will be wasted if it is only used as a backup target. In fact, if you look closely, their main selling points (“What Makes rsync.net Special”) are not special at all… Since they started in 2001 a lot has changed. It’s hard to compete with amazon, google, and microsoft.

Droolio · 20 August 2022 17:41

What? Why is that downside?

saspus · 20 August 2022 17:46

I’ve just explained right there. “Free” is an illusion. Traffic costs money. Not charging a specific customer for it means the aggregate cost is rolled in into the storage cost. For backup usecase there is very little egress. Hence, the customer will be paying for other users’ egress. Or, put it another way, part of the payment will be going to cover other user’s egress and as a result customer will receive smaller value of services provided, aka overpaying. Same goes on the infrastructure/API costs

In other words – “free” things, as always, are the most expensive ones.

sevimo · 20 August 2022 18:13

LOL, that’s getting better and better. But I’ll play the ball and expand the argument. So both Google Cloud storage and AWS charge for storage and bandwidth, and are right there in terms of how things should work, right? But wait, Google Cloud is losing a ton of money as a business unit - this means if you use GC you’re leeching off ad users who have to pay higher ad rates to support your GC usage. Not fair.

AWS is opposite, it is quite profitable. But wait, that means that you’re supporting some other businesses that lose money, like Amazon’s investments into electric trucks/vans in Rivian. Not fair again!

rjcorless · 20 August 2022 18:21

Blockquote saspus
There must be a typo there somewhere. So I assume 200GB. Since you only want to store it for a very short time, AWS “S3 Standard - Infrequent Access” tier seems the best fit, which results in 200GB * $0.01/GB/month = $2/month, + api cost. (see Amazon S3 Simple Storage Service Pricing - Amazon Web Services).

I’ll take a look. Cheers!

Blockquote saspus
This is so weird. What software is this?

EaseUS Todo Backup their ‘Enterprise’ version even though I just use if for my personal PC.

Blockquote saspus
You can backup with duplicacy to unraid, and replicate with another instance of duplicacy from unraid to the cloud. Then, if you need to restore, and unraid is gone, you can initialize duplicacy with that cloud destination locally and restore directly. No need to download in full.

I’m afraid I got into the habit of using EaseUS to do a full system/drive image backup including the UEFI partition. To ‘restore’ the entire system EaseUS has a Pre-OS installed that can just restore straight to the UEFI and boot partitions in their entirety. Not sure how I could achieve that with Duplicacy if there is a non-functioning system drive to sort out!

Blockquote saspus
How many threads? With s3 you can use 10, 20, 40 etc threads, and you are only limited by your ISP connection. Duplicacy has built-in benchmark. What does it report?

It was 4 threads. Interesting, I’ll test with more and see what happens!