Diabolical download speeds off OneDrive using Duplicacy

sevimo · 19 August 2022 20:44

You can have successful all-inclusive models. and unsuccessful pay-as-you-go models, there is no direct dependency there. All-you-can-eat restaurants? Can be sustainable. Unlimited bandwidth internet connection? Can be sustainable. Unlimited minutes mobile plans? Can be sustainable.

My internet provider provided unlimited bandwidth plans for many, many years, and there is no indication they are going out of business any time soon. In fact, unlimited bandwidth for end users is more of a norm than the exception nowadays. Are heavy users being subsidized by light users? Sure, just like people buying things on sale are being subsidized by people who pay full price.

And if you think just because you’re paying a-la-carte your services cannot be terminated or substantially changed, well… let’s just say there is no universal law that prevents it from happening. You may want to check out Nirvanix story, it’s dated but still quite relevant. Some clients had to extract petabyte+ datasets on a two week notice

I personally treat all storage as unreliable, and public cloud is no exception. As far as I am concerned, any single storage can disappear in its entirety at any particular moment for whatever reason, and my overall infrastructure should be resilient to such events.

saspus · 19 August 2022 22:07

I’m looking at this from a value to a consumer perspective, not service providers’ ability to make profit. And I’m not saying that pay-as-you-go model guarantees service immortality. All the above you mentioned are examples of horrible, to the consumer, models, for two main reasons:

These models unfairly penalize many light users, to subsidize few heavy abusers.
Service provider incentives are not aligned with that of the customer: service provider is incentivized to prevent you from using resources you paid fixed price for – because the less/slower you use – the more they earn. In contrast, in pay-for-what-you-use approaches – the more/faster you use – the more they earn.

You can wrap it any way you want – but this conflict is fundamental and unavoidable.

rjcorless · 20 August 2022 12:40

I’m beginning to think that Amazon S3 is probably the better option for me. I just want to store a weekly full system backup on a Sunday (~75GB compressed) and daily incremental backups (~ 2GB) and hold two weeks worth. So about 175GB in total which I hope never to have to download for a full system restore. Trouble is is that the backup software I use doesn’t appear to support S3. It supports OneDrive (Personal and Business), Dropbox and Google Drive, but that’s it. Mmmmmm.

May just think about backing up to Unraid and then let Duplicacy back that up to S3. But this means that if my backup on the NAS was unavailable, I would potentially need to download a full weeks worth of backups to something else to restore my system instead of just restoring directly from the Cloud. Never easy and straightforward is it?!

Just wish my testing of the Duplicacy process even with S3 was faster for a potential restore than the 8.5MB/s I experienced. Perhaps a bit of throttling on a ‘free’ 5GB account?

Does the download speed look any faster on a paid-for S3 tier that you’ve experienced?

saspus · 20 August 2022 15:01

There must be a typo there somewhere. So I assume 200GB. Since you only want to store it for a very short time, AWS “S3 Standard - Infrequent Access” tier seems the best fit, which results in 200GB * $0.01/GB/month = $2/month, + api cost. (see Amazon S3 Simple Storage Service Pricing - Amazon Web Services).

This is so weird. What software is this?

You can backup with duplicacy to unraid, and replicate with another instance of duplicacy from unraid to the cloud. Then, if you need to restore, and unraid is gone, you can initialize duplicacy with that cloud destination locally and restore directly. No need to download in full.

How many threads? With s3 you can use 10, 20, 40 etc threads, and you are only limited by your ISP connection. Duplicacy has built-in benchmark. What does it report?

gadget · 20 August 2022 15:31

Unless the daily changes are new compressed music files, images and/or videos, Duplicacy’s deduplication will reduce the total storage requirement quite a bit.

A traditional 7-day round robin schedule of 1 full + 6 incremental backups isn’t needed with Duplicacy. Your very first full backup will likely be less than the estimated ~75GB. Each daily incremental afterwards will likely be less than the estimated ~2GB, and might even shrink over time depending on the type of data.

One of the huge advantages to a chunk-based deduplicating backup tool like Duplicacy is that only the very first run is a full backup. All successive uploads to the same storage destination – while technically “incremental” backups – are effectively full backups because duplicate chunks are reused. Any snapshot, including the first one, can be pruned at any time.

Nope, unfortunately reliable backups rarely are.

I haven’t compared Amazon S3, but it certainly is the case for Google Drive and Microsoft OneDrive (I have business accounts for both at work).

Given that your storage requirements are around 100GB (75GB + 13 * 2GB), as an alternative to OneDrive / Google Drive / S3, consider rsync.net.

Rsync.net offers a special pricing tier for advanced users willing to pay annually https://rsync.net/products/borg.html – 200GB of data costs $36/yr with no ingress/egress charges and/or bandwidth caps (minimum charge is for 100GB = $18/yr, $0.015/GB thereafter).

I helped a friend set up a backup to rsync.net. No “buckets” or other opaque storage format and no special API, just standard SSH + SFTP access to a Linux account that you can store data on however you want.

saspus · 20 August 2022 17:06

A good and reliable solution must be simple and straightforward. Otherwise, you can’t trust it. If at any point you feel the arrangement becomes too cumbersome – it’s time to stop, re-assess and likely start over.

I’ve looked at it, rsync.net may still has its uses, but it’s not a good fit for backup in general and OP in particular for many reasons:

You have to pay for storage upfront, regardless whether you use it or not, and therefore waste unused space
There is 680GB minimum order.
Cost of their geo-redundand tier is comparable to the most expensive AWS S3 tier
No egress fees: With backup you rarely if ever have to egress, and yet you are indirectly paying for other people’s egress.
No API fees: There is very small amount of calls involved in uploading backup data, and yet you are indirectly paying for other people’s use of the infrastructure.

They provide an interesting solution with interesting features, but those features will be wasted if it is only used as a backup target. In fact, if you look closely, their main selling points (“What Makes rsync.net Special”) are not special at all… Since they started in 2001 a lot has changed. It’s hard to compete with amazon, google, and microsoft.

Droolio · 20 August 2022 17:41

What? Why is that downside?

saspus · 20 August 2022 17:46

I’ve just explained right there. “Free” is an illusion. Traffic costs money. Not charging a specific customer for it means the aggregate cost is rolled in into the storage cost. For backup usecase there is very little egress. Hence, the customer will be paying for other users’ egress. Or, put it another way, part of the payment will be going to cover other user’s egress and as a result customer will receive smaller value of services provided, aka overpaying. Same goes on the infrastructure/API costs

In other words – “free” things, as always, are the most expensive ones.

sevimo · 20 August 2022 18:13

LOL, that’s getting better and better. But I’ll play the ball and expand the argument. So both Google Cloud storage and AWS charge for storage and bandwidth, and are right there in terms of how things should work, right? But wait, Google Cloud is losing a ton of money as a business unit - this means if you use GC you’re leeching off ad users who have to pay higher ad rates to support your GC usage. Not fair.

AWS is opposite, it is quite profitable. But wait, that means that you’re supporting some other businesses that lose money, like Amazon’s investments into electric trucks/vans in Rivian. Not fair again!

rjcorless · 20 August 2022 18:21

Blockquote saspus
There must be a typo there somewhere. So I assume 200GB. Since you only want to store it for a very short time, AWS “S3 Standard - Infrequent Access” tier seems the best fit, which results in 200GB * $0.01/GB/month = $2/month, + api cost. (see Amazon S3 Simple Storage Service Pricing - Amazon Web Services).

I’ll take a look. Cheers!

Blockquote saspus
This is so weird. What software is this?

EaseUS Todo Backup their ‘Enterprise’ version even though I just use if for my personal PC.

Blockquote saspus
You can backup with duplicacy to unraid, and replicate with another instance of duplicacy from unraid to the cloud. Then, if you need to restore, and unraid is gone, you can initialize duplicacy with that cloud destination locally and restore directly. No need to download in full.

I’m afraid I got into the habit of using EaseUS to do a full system/drive image backup including the UEFI partition. To ‘restore’ the entire system EaseUS has a Pre-OS installed that can just restore straight to the UEFI and boot partitions in their entirety. Not sure how I could achieve that with Duplicacy if there is a non-functioning system drive to sort out!

Blockquote saspus
How many threads? With s3 you can use 10, 20, 40 etc threads, and you are only limited by your ISP connection. Duplicacy has built-in benchmark. What does it report?

It was 4 threads. Interesting, I’ll test with more and see what happens!

saspus · 20 August 2022 18:31

That’s irrelevant, and missing the point entirely. We can discuss this too, but it’s a whole separate new topic.

Here in that comment, I’m talking about the value for money the service provides to its customer. Anything “free” there, and/or rolled into the “fixed” cost is a very poor value for a majority of users, by design; that’s the whole point of doing it. With itemized invoices, it’s much harder to screw the user over.

Profitability of a specific company is not a topc of this dicussion.

saspus · 20 August 2022 18:39

Duplicacy is not designed for a full system bare metal backup.

Generally, if you need full system backup, a hybrid approach is advised: infrequent bare metal backup, (say, monthly, or after major system updates or changes, or never) and frequent user data only backup (say, hourly). That way, system data does not compete with user data for spacce and bandwidth, and user changes don’t sit in the queue behind bulk system backups.

Droolio · 20 August 2022 19:41

Barely costs anything at these scales! Charging for “API” is even more silly, though you can understand the reason AWS et al do it with their service - because big businesses do things on a much much larger scale, where ingress/egress and transactions do ramp up along with its storage size - needing to be time-critical, will actually have tangible cost at those levels.

This isn’t necessarily the case at smaller scale and with home users, and smaller providers. Quite frankly, it’s really up to the provider to juggle that, and for the consumer to stop conning themselves into a worse deal over some questionable principles. This mentality is the reason why certain countries don’t have internet connections with unlimited bandwidth as a defacto standard.

Anyway, the fact that Google Cloud Storage and Google Drive exist in the same Workspace product, proves this isn’t a downside.

saspus · 20 August 2022 20:01

I agree. Those “principles” are justification or explanation of what happens, not the ultimate goal in themselves. And yet, the experience suggests that the overall quality and value of the service received strongly correlates with the incentives’ alignment, as described above. So, as a shortcut, to avoid scrutinizing every service in every respect, (which is often impossible to do, and even if you did – it may change tomorrow) you can go by these rules of thumb. As a result, you will get with better deal.

For specific examples: people who think that they found a “deal” of getting 6TB for $5/month though Office365 discover after uploading a large chunk of data that performance sucks, API has bugs, and Microsoft would not do anything about it. Any savings evaporate this instant.

Disagree. With Workspace, you pretty much get storage for free: storage is not a product, it’s incidental to the SaaS they are offering, the collaboration and management platform. If you recall, they have never enforced storage quotas there (I don’t know if they still do). They are effectively “unlimited”. Would it be wise to use a Google Workspace account as a replacement for GCS or AWS S3? Hell no.

But… but. I was doing it myself!. Yes, I was dumb, and learned on my mistakes, so you don’t have to.

gadget · 20 August 2022 20:57

I agree entirely with the general notion. We’re all seeking the Holy Grail of backup solutions.

For the average user, it usually means it’s on by default and just happens without any effort (e.g., iOS and Android devices backing up to their respective cloud services).

For more sophisticated users, it’s most often a USB drive sitting on a desk or drawer. The level of complexity ranges from simple drag and drop to some software solution.

And then there’s you, me and the other folks on this and similar websites with more advanced requirements…

For us, the journey to backup nirvana begins with deciding on which path to take (offline, DAS, NAS, cloud, or some combination thereof?); the simplicity of drag-n-drop, disk imaging or software with a multitude of backup options; and sifting through all of the service providers to find out what best meets our needs (speed, cost, compatibility, reliability, etc.).

At the end of the day, it takes us a lot of work to make our backup solution(s) look straight forward, simple, reliable, and good all at the same time.

That particular minimum only applies to the default service plans. There’s a special “Borg” service plan (not limited to the Borg backup software) with a 100GB minimum ($18/yr).

While it’s true that storage is paid for up front, other than the relatively small annual minimum ($18 isn’t a whole lot of income for a service provider after factoring in 1%-6% for the card issuing bank + network fees + merchant bank fees), additional storage is billed in 1GB increments for $0.015/GB ($0.18/yr) and storage that’s added/removed is prorated for the remainder of the year so wasted space can be kept to a minimum.

For a user with 5GB to back up, there are definitely cheaper options than rsync.net if cost of storage is paramount. But it’s also why Dropbox currently has 700 million users but only 17 million paying customers (< 2.5%) and a $8.6 billion market cap while having over $3.2 billion in debt and other liabilities on its balance sheet. The former has been profitable while the latter is still a work-in-progress.

The main course plus sides might not be the best value meal ever, but it’s still among the lowest overall tabs out there (it’s almost dinner time ).

Also true, but the standard plan might be sufficient for many users, especially for those who follow a 3-2-1 backup protocol.

Given the storage requirements @rjcorless estimated, if rsync.net’s datacenter got hit by a nuke, as long as one of the devices being backed up is outside of the blast radius, there’s likely plenty of time to make another backup.

I honestly don’t think rsync.net is actively trying to unseat any of the big three. It’d be a futile exercise. But rsync.net doesn’t have to out compete in order to be a sustainable business.

There’s Google Photos and many other free alternatives, but yet SmugMug has a loyal paying customer base (only free option is a 14-day trial). I used to be a SmugMug customer and would again if the need arose because it was a good value.

Same goes for Fastmail which is competing with Gmail, Outlook/Hotmail and Yahoo. I have family and friends who’d balk at paying even $1/yr for email service but plenty of enough people pay for Fastmail to have sustained it since 1999.

For sure, rsync.net won’t be everything to everyone, but it’s certainly something to someone for it to have lasted more than two decades so far. It’s good to have a variety of options to choose from.

Droolio · 20 August 2022 22:34

Sorry, I’ve seen no evidence here that this is the case. Unnecessary paying for egress and API calls doesn’t make something a ‘better deal’ and certainly no rule of thumb to actively look for.

Google Drive is perfectly fine as a backup destination. No download caps or costs, no API charges. Why would any sane home user choose GCS over GCD when it fits the bill and costs significantly less?

rjcorless · 20 August 2022 22:43

Well, just did a test on S3 with 10 20 30 and 40 threads. Looks like 30 is the max/optimum for my bandwidth as I could do a restore @ 15MB/s.

Can anyone tell me how to run the benchmark test because (i) going to either an unraid console or the Duplicacy console resulted in “Duplicacy not found” message and (ii) once I found “duplicacy_linux_x64_2.7.2” was the program to run, I got init errors and I could not work out what to do as I am using the Web Gui.

I am not CLI proficient and probably know enough to be really dangerous!! Ta.

saspus · 21 August 2022 03:52

TLDR – because it’s a false economy.

One may come to that conclusion with naive approach of only optimizing “storage cost on paper”. But this not the only factor. If you consider the whole package – how much time and money is it cost to run duplicacy to GCD – the cost of GCS becomes negligible in comparison.

It goes back to using the right tool for the job. GCD is not designed for bulk storage. GCS on the other hand is specifically designed for that. So one should expect fewer issues with the latter and more with the former.

And indeed, have you noticed I stopped posting here with technical problems quite a while ago? And have you seen all the issues I reported with GCD? Blog posts I written? The amount of time I spent triaging them and building workarounds would have covered the cost of GCS tenfold for decades. And that is GCD, that is one of best ones, that is possible to made work. I gave up on OneDrive in an hour. An hour I will never get back, mind you. I value that my hour, let alone amount of time I spent triaging GCD issues much more than the aggregate cost of the storage on amazon for years.

Have you seen any issues anyone reported with S3 or GCS, that were not a configuration issues? I haven’t. I wonder why.

And lastly, a plot twist, invalidating the false premise of “GCD being cheap” even on paper itelf: google cloud drive is not cheaper than proper S3 archival tier storage over the lifetime of a backup. Duplicacy does not support archive tiers, and thereby forces users to overpay for hot storage. Backup in a hot storage is an oxymoron. The middle ground is S3 intelligent tiering, but this requires reading and analyzing long term costs. End result is be a massive time and money savings for home users who predominantly backup static media. This is something that Acrosync can do with duplicacy-web – to wrap archival storage into a end user product drastically saving money to the customers. But that’s whole other topic.

saspus · 21 August 2022 03:59

you need to run it in the folder where the storage is initialized. Ideally, run it on your desktop, to remove unraid from the picture entirely.

Otherwise, go to the backup or check logs in WebGUI and in the first few lines will be a path to a temp folder.
You need to cd to that folder and run duplicay benchmark (with the ful path) in there

rjcorless · 21 August 2022 08:47

I’m only running Duplicacy on the Unraid at the moment, not on my PC. Because I want a quick-as-possible restore process, I’m using a combo of EaseUS ToDo for the ‘bare metal’ backup, and I’ve also set it up to run a ‘smart’ backup which runs every 30 minutes across all my documents and stores locally on another (non OS) drive in my PC (this runs Incremental backups throughout the day than as midnight rolls over it creates a differential backup for the entire previous day and re-starts incrementals for the next day.) Then I use Syncbackpro to transfer to my Unraid NAS at the end of the day. I was just going to use Duplicacy to provide that cloud based back up from the Unraid server.

Synbackpro can save directly to the cloud as well, including S3 (which ToDo can’t).

As you may tell, my ‘backup’ strategy is a bit of a patchwork quilt precipitated by me building my first NAS with Unraid on an old re-purposed Sandybridge MB with an Intel 2500K and 16GB Ram. On oldy but a goody! It has a very stable Overclock of up to 4.5Ghz, great cooling and 8 SATA ports. Perhaps my strategy is more a reflection of ignorance more than it is necessity!