Wasabi reliability, data loss?, oh my

quiet.art · 18 November 2023 00:15

After reading through Missing Chunk during copy command only, I don’t even know what to say… wow. I am also waiting for Wasabi’s support to get back to me, but what a complete mess.

They recently increased their pricing, and I lost my long time early adopter pricing, which almost doubled my spend. So, I decided that I wasn’t getting enough value and have been migrating my revisions off of their platform. It turns out that the increase in pricing also bumped me to their “free” egress tier, so while I had originally planned to only grab a subset of my revisions, I decided what the heck, I’ll take all 13TB please…

With about 3 hours and 22000 chunks to go on the full copy, I started hitting the “Internal Error” messages from the linked thread. Lots of them. Well, as if I needed another reason to leave them… yikes. Sorry, just venting here.

quiet.art · 18 November 2023 00:44

Re-reading through the topic linked above, I see Wasabi was blaming it on a service change made between a couple dates in Feb 2022, but I just hit the internal error on a chunk from April 2020. Shame on me for not conducting more thorough checks of my backup data. Lesson learned.

saspus · 18 November 2023 01:52

As always, what you pay is what you get. Wasabi, Backblaze, and other clones compete on price. That’s the only goal – be cheaper than the next guy, and cut all the corners that may or may not get in the way. Providing free egress takes this to the next level – you can’t abuse egress if you can’t download your data, lol.

With Wasabi – you may want to try different endpoint. When I was testing them few years ago – us-west-1 was almost never working, and us-east-1 was a bit better.

If you want reliability stick to the big boys – AWS, GCS, Azure. Expensive? Sure. But reliable. Is not hot storage for backup an overkill? Absolutely. But since duplicacy does not support archival storage – that’s what what it is.

I like Storj, it helps avoid most of the uptime and durability issues of centralized storage – but I also understand it’s a small young company, and who knows what will happen. So I also have another backup at Glacier.

quiet.art · 18 November 2023 02:10

Ha. I wasn’t even on a free egress plan prior to this past billing cycle. To be honest, that factored into my hesitation to do something like a full disaster recovery–it would have cost me more than a year of storage costs. Clearly I wasn’t taking this backup strategy very seriously.

Hmm. When I try to change it over to us-east-1 in my preferences, I get a The storage has not been initialized. My bucket is in us-west-1, can I access that from another endpoint?

I’m actually migrating out of the cloud. I would rather manage my own infrastructure…and only have myself to curse at when it goes tits up. Looking at how much money I’ve essentially just donated to Wasabi for absolutely nothing, that by itself pays for lots of nice hardware and spares. I’ll have my local storage here, and an offsite replica at my brother’s house.

saspus · 18 November 2023 03:11

Sorry, no, I meant switch to their east coast datacenter. But I guess inter-datacenter transfer is not free… because why would it be.

This is not bad in itself: cost of recovery times probability of needing one versus cost of storage is what matters. Generally, probability of needing full restore is about 0, so it’s reasonable. I’m using aws glacier as a “disaster recovery” target. Egress is quite expense, above the free 100GB/month allowance, but I don’t expect to need to restore it ever – so I prioritized storage cost. $1/TB/month is pretty good.

Yeah. I used to manage my own stuff. Then moved 100% to the cloud. Now I’m back running old supermicro server with freebsd and pile of disks, for the same reason. It’s mine and I’m responsible, and I’m the one to blame if something goes wrong, and I prefer that.

Precisely, server at my brother’s place, server at my place, and we backup to each other. That’s the main backup, amazon AWS is a disaster recovery one.

We just send incremental filesystem snapshots over zerotier. Works like a charm.

quiet.art · 18 November 2023 15:22

Support got back to me, stating that my problems aligned with maintenance they were doing in the region. Sure enough, I kicked it off and it finished overnight. There are a few problems with that excuse, but in any case, my data is mine again. Wasabi users beware.

saspus · 18 November 2023 19:33

If they announced maintenance in advance — that maybe ok. Still, performance could have dropped, but data shall not go offline during any maintenance. They may not have enough redundancy there in the stack. Not exactly surprising.

quiet.art · 18 November 2023 21:50

Right, the biggest problem is data going offline during their maintenance. Their storage design and maintenance routine should not allow for that. It is supposed to be “hot” storage after all. My data was more like room temperature when I finally got it all back.

Given that egress was extremely rare for me to do, it’s also surprising that I would experience a problem during my job… unless internal error/data offline is just a common thing on their platform.

snellgrove · 22 November 2023 13:00

While this statement is exactly true… it’s also not the full picture either.

Duplicacy works just fine with Google Cloud archive storage, because, and I quote Google here from this page:

Unlike the “coldest” storage services offered by other Cloud providers, your data is available within milliseconds, not hours or days.

So unlike Azure or AWS, your “archive” tier files are still “online” and not dehydrated/offlined/unavailable. It’s slightly more expensive than AWS GDA tier but I’ll take that, it’s still near $1/TB/Mo for me now.

I racked up a bit of a bill initially as I tested it thoroughly, but restores work which you can’t say for AWS GDA or Azure archive blobs.

saspus · 22 November 2023 19:59

Essentially you are paying with higher cost to store (20% more) and double minimum retention (365 days vs 180; not sure how that translates to cost, depends on data turnover rate) — just because duplicacy lacks a feature. I don’t want to make that compromise.

Perhaps a good alternative is to rclone static immutable data (the bulk of most home users store — photos and videos) to glacier and use duplicacy with a small mutable working set, that can be anywhere, even on Amazon nearline; it’s so small that cost does not matter.

Versioning, deduplicating, and compressing media is 100% waste of effort.

snellgrove · 23 November 2023 10:34

Point taken, but say I was storing 5TB: rather than $5/mo, it’s basically 20cents per TB more… so $6TB/mo. Basically 20% more of something really cheap, is still quite cheap. It’s going to be fairly immaterial for most duplicacy users (I say this completely unfounded and without research, but who here has 50, 100, 500 terabytes?)

Yes there’s a 1yr minimum storage duration or you get early delete fees, but I was storing my backups long term anyways on Amazon, so (at least for me) it’s purely academic and not actually any different at all. Maybe someone who’s churning their backups rapidly would benefit, but then hot storage might be no more expensive for them anyways if they were really going for it. Maybe I’m a complete edge case, but I imagine most Duplicacy users have multi-year retention - you never know when that file went corrupt, or you made a mistake and didn’t notice for years IMHO.

Ultimately I’m on Duplicacy because I’m a Crashplan refugee and I still want to work in a similar way - basically the ability to go back almost forever, it’s saved my bacon for reals, more than once. Otherwise I’d just pay for the consumer backblaze backup plan and have 30 days.

The main bonus point is: have you ever tried to do a restore from AWS?

The workflow I found was this:

You have to make sure you’ve disabled your AWS config thing (I forget it’s name) that sets those blobs to archive tier in the first place, or you’ll be chasing your tail…

You try and restore. it errors. it gives you a chunk reference in the duplicacy log file, you go find the blob, you tell AWS to restore it. You wait up to 12 hours or something (or you pay $currency for speed) you then come back and check on the console periodically… once it’s there, you restore the file overwriting the original.

You try and restore again. it gives you another chunk reference…rinse repeat maybe hundreds of times, each seperated by up to 12 hours until duplicacy finally actually restores. A large file may take actual months of daily effort.

Maybe there’s a way to get duplicacy to give you the entire list of chunks it needs, and then you could feed that into S3 CLI but I didn’t research that far.

Alternately you restore your entire some-terabytes archive which will probably cost you hundreds of $currency - you just wiped out any saving you made by using GDA when Google “just works”. I think for me to do a full DR from AWS was something like $750, and I can’t restore files granularly. Ouch.

Personally I think it’s worth the marginally increased storage costs for a platform that actually works with Duplicacy, no messing about required. It’s ultimately a small price to pay surely?

Christoph · 25 November 2023 19:54

How do you get that price? This page, for example, says it’s $4/TB/month…

saspus · 25 November 2023 21:03

It’s a Glaciar Deep Archive.

I’m thinking to rearchitect the whole backup over the holidays: rclone immutable data to archive and keep mutable data in hot storage. Price won’t matter because there is very little of mutable data.

Christoph · 26 November 2023 16:43

I see. Good to know that there are multiple Glacier products. But is my understanding correct that all of these are technically S3 storages, which means they can all be accessed by Arq?

Then again, duplicacy also supports S3 but not Glacier. So S3 can’t be the sole criterion.

Could you elaborate? What is “rclone immutable data”? Do you mean this: GitHub - emmetog/immutable-backups: A wrapper around rclone to perform immutable backups, both full and incremental (and restore them)? You mean you want to use rclone to backup to glacier?

And what do you mean by “keep mutable data in hot storage”?

Is arq no longer going to be part of your backup setup? I was just about to finally give it a try…

saspus · 26 November 2023 20:50

S3 is a protocol. Amazon’s cold storage is offline, it needs to be “thawed” (transferred to hot storage) before data can be accessed. This can take a few hours. The application needs to be able to handle that: make a request for files, wait, and then proceed restoring data.

Duplicacy does not manage those types of storage today so you can’t use Amazon glacier of any kind with duplicacy.

You could however use Google cloud storage Archive tier: it’s slightly more expensive, and has twice longer minimum retention but does not have thawing requirements. I don’t know what is there free tier parameters, ie how much data can you restore for free, but this is one of the options to use with duplicacy that likely will be still cheaper than hot storage.

Photos and videos make up the majority of my data. That data never changes, it is not compressible, and cannot be deduplicated. Versioning it is only useful to protect against bit rot on the source.

Hence, instead of using duplicacy to back it up, I could rclone copy data to the cloud, using access keys that prohibit delete. Then changed data (read: corrupted) will fail to overwrite good data on the cloud.

The amount of rest of the data, that will benefit from versioning, is small, few hundreds of GB at most, stuff like documents, spreadsheets, projects. It can be backed up to hot storage, that does not require thawing and is supported by duplicacy. For example, Amazon standard or infrequent access tiers. Their higher cost is irrelevant because total amount of data is very small.

I will still use it to backup that hot data. But now for different reason: it supports backing up cloud-only files. I have 3TB of stuff on iCloud and my Mac has only 1TB of space. I don’t have any other way to handle that.

towerbr · 27 November 2023 19:45

… or to protect against errors made by the “part” between the keyboard and the chair.

saspus · 27 November 2023 21:05

Absolutely; I’d argue that part malfunctions more often than hardware. Same solution applies though - forbid deleting once uploaded (this also takes care of modifications; most cloud storage protocols, and specifically S3, require delete permission to modify the file)

joe · 28 November 2023 17:39

Wow, this seems like a massive oversight for any backup program. I was about to look into Glacier deep archive, but I guess I’ll have to do that in Arq instead of Duplicacy.

I’ve noticed myself relying on Arq way more often than Duplicacy nowadays (better Storj compatibility, offline file support, and now Glacier deep archive). Perhaps I shouldn’t have just bought the lifetime Duplicacy license. Oh well.

saspus · 28 November 2023 18:17

I would not call it “massive oversight”. It’s a missing, rather niche, feature, that appeals to a small minority of home users, but unlike adding support for yet another storage provider (see how Arq added Storj by just templating the S3 parameters) this requires some re-architecting. This could be much larger risk than potential reward.

At its core archival tiers are for archiving – stuff you don’t need, but don’t want to throw away. Archival is not the same as backup. Some apps support it for backup, but even in disaster recovery scenario, waiting for 12 hours for data to defrost only appeals for price-focused solutions, aka home users.

Arq decided to support it, but it is in minority.

Can you elaborate on this one? Arq uses storj S3 gateway, and so can duplicacy. In addition, duplicacy supports native storj backend. So, how is that Arq’s compatibility is better?

joe · 28 November 2023 18:29

Arq automatically adjusts the block size for perfect compatibility with Storj’s object sizes, without needing to manually edit storage config settings like Duplicacy.