Corrupted chunks

The restic issue has been updated with a report that the issue was fixed on 26 Feb.

Any updates to the Backblaze issues or correspondence that anyone here started?

Very interesting. The issue is definitely not fixed for me. I just tried using the B2 CLI to download the 42 chunks that Duplicacy reported as corrupted, and the results were 3 successful downloads and 39 SHA1 checksum mismatches. I’ll follow up on my Backblaze support ticket.

2 Likes

The response was:

Our engineers have been looking into your account and it appears that after resolving the first issue plaguing your bucket, a second issue had occurred. However, they have informed me that another change went live around noon today, Pacific time.

My level of confidence in B2 keeps dropping with each interaction… two separate data corruption bugs? However, my B2 account seems to be working now and I’m no longer seeing download corruption. I asked for more details on what actually happened here. If they respond, I’ll update this thread.

5 Likes

Thanks for keeping us updated. At least it sounds like no data was actually lost?

Right, I don’t think any of my data was lost (I haven’t yet run another full verify due to the cost). It seems what happened is:

  • Rather than verifying checksums on download, Backblaze relied on an ongoing async job that would look for corruption.
  • A bad batch of hard drives recently caused a bunch of corruption that made this job run much more slowly than usual, causing it to take a long time to find the broken shards I and the Restic users were downloading.
  • The issue was affecting at least two vaults. The Restic users and I have our data on different vaults.
  • A fix which verifies checksums on download was written and deployed on the vault with the Restic users’ data first, but didn’t roll out to every vault until later, which is why I still saw broken files for a while.

My takeaway from this is that though it’s still a little surprising that downloads were never verified until now, the whole response to this seems pretty quick and effective, and I’m personally reassured enough to continue trusting Backblaze with my data.

4 Likes

Thanks for the update and summary. And yeah, the response seems to have been reasonably quick.

Regarding your verify cost, it’s fairly easy to get free downloads by sending your data through CloudFlare. At that point you’re only going to be charged for the API access. I suppose with a large repository that could still end up costing more than you’d want.

Thanks again.

@arno I actually set up Cloudflare-B2 a few days ago (thanks for mentioning it in a previous post). But isn’t pulling 10TB on a Cloudflare free plan a good way to get banned? Does Cloudflare say this is ok?

Hah, good question. I haven’t been able to find anything that explicitly states they don’t have usage limits on free accounts, aside from some link previews that don’t match the actual page. I’d like to think they wouldn’t, given their operating claims, but when you phrase it as 10TB it does raise my eyebrows.

Are there any instructions there on how to do that? And how would I need to configure Duplicacy for this?

This page is pretty straightforward: Using Backblaze B2 with the Cloudflare CDN – Backblaze Help

Then set duplicacy to use “b2-custom://your-b2.domain.com/your-b2-bucket”

2 Likes

Hi :slight_smile:

Jumping in because i find this pretty interesting.

Did a little research and found this post on reddit: https://www.reddit.com/r/backblaze/comments/i7udiu/b2_downloads_for_free_via_cloudflare_bandwidth/ghi33tb?utm_source=share&utm_medium=web2x&context=3

So actually this method still works?

This method works (I am using it now), but probably only until Cloudflare notices.

1 Like

Hah, until Cloudflare notices that they’re part of their own Bandwidth Alliance program?
If that’s the case, I hope they don’t notice :wink:

Apparently that user was blocked because was using this method to transfer files different from HTML pages :thinking:

Yeah, I was just reading further on those pages, but it looks like they’re talking about CF workers. And as one of the comments points out, the B2/CF coordination that we’re using is demonstrated on the B2 article as serving an image.

So that’s potentially conflicting and potentially referring to something else. My understanding is going to stick with the B2/CF integration allowing this, but I’ll keep my eye out for any reason to abandon that. Fortunately it should be easy to switch back to B2 directly.

I see.

Let me ask a question, so to use this method i should register a domain name, that will cost me a certain amount every year. So if i download very few times from B2 cloud this is not worth the effort. Am i right?

I mean, B2 cloud charges 0.01$ per GB downloaded, so this method could be useful in case the B2 cost would be higher than the cost for keeping my domain up?

Yeah, that sounds about right. I already have a domain I was paying for that I could use, but if it’s going to be a new cost you’d need to run those numbers. Domains can be pretty cheap though and the GBs can add up!

2 Likes