Corrupted chunks during restore

I attempted to restore a little over 1GB of data this morning and encountered a “corrupted chunk” error. I opened the details and see the below:

2021-09-07 09:47:50.291 INFO REPOSITORY_SET Repository set to C:/Users/Primary/Downloads/Dragon
2021-09-07 09:47:50.292 INFO STORAGE_SET Storage set to odb://Duplicacy/Backups
2021-09-07 09:47:51.550 INFO SNAPSHOT_FILTER Loaded 1 include/exclude pattern(s)
2021-09-07 09:47:52.691 INFO RESTORE_INPLACE Forcing in-place mode with a non-default preference path
2021-09-07 09:47:54.716 INFO SNAPSHOT_FILTER Parsing filter file \\?\C:\ProgramData\.duplicacy-web\repositories\localhost\restore\.duplicacy\filters
2021-09-07 09:47:54.717 INFO SNAPSHOT_FILTER Loaded 0 include/exclude pattern(s)
2021-09-07 09:47:54.731 INFO RESTORE_START Restoring C:/Users/Primary/Downloads/Dragon to revision 150
2021-09-07 09:47:56.917 INFO DOWNLOAD_PROGRESS Downloaded chunk 1 size 4004593, 1.91MB/s 00:02:04 1.5%
2021-09-07 09:47:59.550 WARN DOWNLOAD_RETRY The chunk 60097f696a804bfa03a7f3fd7ac11a1ed71c5c0a525c8b51aba81ed707d39cd3 has a hash id of 5329f32af5e26113412d48b71ff4f59f294b974d6e348e6e2e06d84cc4781895; retrying
2021-09-07 09:48:01.797 WARN DOWNLOAD_RETRY The chunk 60097f696a804bfa03a7f3fd7ac11a1ed71c5c0a525c8b51aba81ed707d39cd3 has a hash id of 5329f32af5e26113412d48b71ff4f59f294b974d6e348e6e2e06d84cc4781895; retrying
2021-09-07 09:48:04.018 WARN DOWNLOAD_RETRY The chunk 60097f696a804bfa03a7f3fd7ac11a1ed71c5c0a525c8b51aba81ed707d39cd3 has a hash id of 5329f32af5e26113412d48b71ff4f59f294b974d6e348e6e2e06d84cc4781895; retrying
2021-09-07 09:48:06.212 ERROR DOWNLOAD_CORRUPTED The chunk 60097f696a804bfa03a7f3fd7ac11a1ed71c5c0a525c8b51aba81ed707d39cd3 has a hash id of 5329f32af5e26113412d48b71ff4f59f294b974d6e348e6e2e06d84cc4781895

Note, I have a check scheduled to follow every backup. I was just about to buy a license but this doesn’t inspire confidence that I’ll have a backup that is not corrupted.

Check by default verifies that all chunks are present. It does not verify chunk integrity because that would mean downloading all chunks. Since it’s a job of a storage to provide data integrity guarantees this is very expensive and pointless; you still can do that if you want by adding -chunks key to check command)

In your case the chunk got corrupted either before it got uploaded (because of bad or rotten sector and your filesystem (likely NTFS) being is non-checksumming) or while at rest at storage (more likely if the storage does not provide data integrity guarantees. I don’t know if OneDrive does). It is also possible that the backend api experiences issues returning correct data (that happened before with Backblaze — data was correct but api was returning corrupted blocks).

Another possibility is that your filesystem now is in bad state so downloaded chunk fails validation. Delete the cache, run chkdsk c: /f to schedule disk check on reboot, reboot, let it fix the filesystem, and retry the restore.

You can try downloading same chunk with official OneDrive application and comparing it with what duplicacy downloaded. If they don’t match — that’s definitely onedrive bug. If they do match — then it may or may not be onedrive bug.

Generally, I would avoid file based services (and especially OneDrive specifically) as backup targets and instead use S3/B2 type of providers.

I’ll try to find the link on a discussion for the reasoning.

Edit. Here is one (Newbie) Backup entire machine?

Bummer. I was using OneDrive because I get 5TB of space included. Was hoping to not have to use Backblaze B2 for that reason. Sounds like I will have to anyway.

A similar issue was reported here: Corrupted chunks while restoring from OneDrive

I wouldn’t be surprised if there is something wrong with OneDrive. In the past they had a bug causing incomplete chunks to be saved which led to corruption: https://github.com/OneDrive/onedrive-api-docs/issues/1366#issuecomment-692072192

Erasure Coding may be helpful: New Feature: Erasure Coding

1 Like

If you need 5TB of space consider Google Workspace. Many here (including myself) use it for a very long time rather successfully. They even have officially unlimited tier at $20/month, albeit even limited basic accounts are de-facto unlimited: google has never enforced storage quotas (don’t upload petabytes of data of course and you’ll be fine); the only limitation is 750GB/day maximum ingress. For most that is not a limitation at all.

@gchen I just recently began using Duplicacy so I’m guessing I already have the “Erasure Coding” feature? I followed the link but I don’t really understand how to use it.

@saspus 5TB of Google Workspace appears to actually be more expensive than Backblaze B2. Thanks for the alternative though.

5TB of B2 is $25 + api cost.
5TB of workspace is $18.

How is it more expensive?

Erasure Coding was not enabled by default. You’ll need to create a new storage. On the storage configuration page there is an option to enable Erasure Coding.

So if I enable “Erasure Coding” and continue to use OneDrive, will it complete checks and show that I have corrupted chunks if present?

Thanks @saspus, I found the Google Workspace pricing. I am currently only using about 1.5TB of space and I was pricing it against that. You are correct though. When space begins to exceed 3TB, it does work out to be less with Google Workspace than B2.

So I have a considerable amount of space available with a Google Drive account that I think I could use. Is that as reliable as using Google Workspace? I just don’t want to go through all the work of changing the storage only to figure out that chunks are corrupted on Google Drive.

Could you clarify this? I see the price for the 5TB tier. Where is the unlimited one?

So actually also with a Business Starter account is possible to have unlimited storage? I read they give only 30GB.

Billing -> Subscriptions -> Add or upgrade a subscription -> Google Workspace Enterprise Standard -> switch -> Get Started. You will see this:

Here is a demonstration (the google-consumer remote is regular consumer google account and google-business is Google Workspace Business Standard) of how google still does not enforce quotas:

~ % rclone listremotes | grep google
google-business:
google-consumer:

~ % rclone about google-consumer:
Total:   17Gi
Used:    71.845Mi
Free:    8.569Gi
Trashed: 0
Other:   8.361Gi

~ % rclone about google-business:
Used:    2.841Ti
Trashed: 0
Other:   27.713Mi

See how not only Used: space is already over 2Ti “limit” but also there is no “Free:” metric reported. Aka no quota. In reality that account also has a shared folder with 3TB of data which is not even reported here.

My speculation: the way I see it they don’t sell storage; they sell workspace subscription with features that make sense for businesses of varying sizes; for example Enterprise subscriptions offers vault and other enterpris-y features that are of little use for small mom and pop donut shop owners. Therefore storage is incidental, and just works out on average due to huge number of customers that have just a couple of google documents and spreadsheets stored subsidizing storage for folks who backup to these accounts massive media libraries for example, just as it works with other unlimited services (for example Box.com offers unlimited storage for $15/month as well). This (lack of quota enforcement) was the case with g-suite and is still the case with Workspace.

Maybe it will change in the future. Maybe if you abuse the service they will ask you to at least switch to enterprise at some point. But that would be fair…

1 Like

You mean the consumer google drive? Regadless, it’s exactly same cloud software, and in years of using for backup I’m yet to see any issues. Google definitely knows how design, implement, and run cloud services, that’s for sure, unlike some of their competitors.

If you see performance issues you may want to consider using separate google project, as described in the duplicacy remote documentation.

Ok, that’s perfect. Thank you.

Thank you. Don’t want to go off topic but yes this seems great, the only thing is that with B2 you pay as you go instead of paying directly for a final amount that someone could never use totally.

If you have low amount of data (under 2TB) then start with b2 or wasabi and migrate later if and when you need more space. Or maybe fixed cost is better — easier to budget and no surprises down the road nor extra work migrating. It’s a couple of bucks a month — hardly worth you time optimizing out.

1 Like