Check Questions

Hey, something popped up!
It’s been 10 hours now and it said so far…

C:\Users\Carl>"C:\Program Files (x86)\Duplicacy\duplicacy_win_x64_2.1.0.exe" check -files
Storage set to b2://XXXXXXXXXXXXXXXXXX
Listing all chunks
All files in snapshot MOTOCAT-Carl-Carl at revision 1 have been successfully verified

but it is continuing to churn along.
Strangely, it appears to have downloaded significantly more than 225GB… and it is still going. (The bucket only has 225GB in it.)
Can anyone tell me what is going on? I’m going to bed for at least 8 hours now…
I’m curious what it will say in the morning!

You could be hitting the bug mentioned in this issue:

I suspect that it checks each revision without remembering any of the chunks it checked in previous revisions. I started a “check -files -all” on my 500Gig local storage (external USB drives) two days ago and it’s still going, Each revision seems to take the same time even tho I know that many are very similar.

Ah, didn’t notice the -files parameter, sorry, ignore my post. :roll_eyes:

Well, just checked this morning, and I hit the 1TB download cap! and B2 stopped stuff because of my limit.( My limit was set at $10 for this test.)
It does not make sense as there was only 225GB of data in that bucket. Why would it download 1TB?

Here is how the console ended:

C:\Users\Carl>"C:\Program Files (x86)\Duplicacy\duplicacy_win_x64_2.1.0.exe" check -files
Storage set to b2://XXXXXXXXXXXXXXX
Listing all chunks
All files in snapshot MOTOCAT-Carl-Carl at revision 1 have been successfully verified
All files in snapshot MOTOCAT-Carl-Carl at revision 39 have been successfully verified
Chunk 60dcc64b322bb0e9c16281ec3313db14be4c70a4c0ef6ca567a63e748e443d77 can't be found

I assume the “can’t be found” may have happened when B2 limited out?

So was this working properly, and it’s just a giant project?
or is something broken in my older SW version and I just need to upgrade?
Or is check -files foolish with large repositories?

Tips and ideas? How do I verify my backup?

check without the -files option will make sure that all of the chunks needed are there. I personally don’t see a need to use the -files option on Backblaze B2: I trust that they are storing my files reliably.

One possible thing to do if you are worried about the files not being stored reliably (or weird network errors…) is to switch the backup to another storage (another bucket) and when that one is done nuke the first storage. Then there’s no cost to download and it will refresh your chunks even if they were corrupted. (Tho it would cost up to double for the storage itself.)

A way to check the backup and restore process is to restore just some part of your backup. If you restore less than a Gig it’s free from Backblaze B2. I do a partial restore now and then and locally difference that restore with the original files.

check -files currently will download every file in every revision. It is not smart enough to skip identical files in other revisions that have already verified. Therefore, it may download more data than the total size of the storage.

1 Like

Hmm… so it seems check -files isn’t a great plan with 200 revisions of snapshots.
If I understand correctly, does that mean it is going to download every file for the backup for every revision? So that means 200 times my approx 200GB repository/backup?

If so, can I instead just check the latest revision, and will that check all the files that existed in my repository at the last backup/revision? (Not just the latest file changes, right?)
I mean, I really am just trying to check the integrity of the backup of my machine as it exists now. I don’t need to check the integrity of previous backups / changes/ deletes etc.

check -files for the latest revision should be enough. Every revision is a full snapshot, so yes, it checks all files as a whole, not as incremental changes.

Thank you!
I’ll try this soon…

Ok, well, that didn’t work.
I was attempting to check just the latest revision, hoping that would match the approximate size of my current repository.

I ran this:

C:\Users\Carl>“C:\Program Files (x86)\Duplicacy\duplicacy_win_x64_2.1.0.exe” check -r 3896 -files
Storage set to b2://Mxxxxxxxxxxxxxxxxxxxxxxxx
Listing all chunks

And it ran all night, downloading 750+GB, when the repository is only 225GB. And never finished, as I stopped it at that point.

So… any idea what I am doing wrong?

Perhaps it has something to do with the fact I am using my one PC to backup 3 repositories… the PC itself (225GB), network drive “V” (800GB) and another network drive “W” (500GB).
Could it somehow be doing through all the repositories instead of just the PC repository?

When I do list it only lists the PC repository, “Carl”, not the others, “V” and “W”.
In fact, I can’t get it to list the snapshots for the other repositories… but I know they are being backed up! When I do:

C:\Users\Carl>“C:\Program Files (x86)\Duplicacy\duplicacy_win_x64_2.1.0.exe” list -id MOTOCAT-Carl-V

It replies only with the storage having been set for the PC, “Carl”, not the correct one for “V”, and returns no snapshots:

Storage set to b2://xxxxxxxxxxxxxxxx-CARL

I’m feeling lost…

Very likely, yes.

The documentation for check isn’t entirely clear and I haven’t used it much myself with the -files option (only done it manually a few times), but I believe it’ll check all repositories by default if you don’t specify one via -id, e.g.:

duplicacy -log check -files -id Carl -r 3896

This is because, by default, the list command only shows revisions for the repository from which you run it from. To see the others, use:

duplicacy list -all

Not sure why it didn’t show anything when you specified the -id (maybe a typo?) but it doesn’t take too much extra time to just list everything in there with -all.

duplicacy list -all

This doesn’t show the other repositories for me, only the PC “Carl” one… any other ideas?

I could try the check with the -id as you suggest… but since I can’t even list the other repositories, I’m not convinced that is the issue. Also, when I do the check, it shows the proper storage for PC “Carl”… and never gets out of that…

Odd. Have you other repositories completed at least one backup yet? If so, double-check they’re backing up to the same storage URL (bucket?) as the one on the PC. Ofc this doesn’t explain why it’s downloading a lot of data even when you’ve specified a particular revision of a particular repository.

Either backups for other repositories never completed, or you used the same backup id for all repositories. Check the snapshots directory on your storage to see if there are files named 1, 2, etc under subdirectories.

Replies to the comments below:

All three backups are running, under three tabs in the GUI.
Logs look good for all three backups.
I do not share buckets for the separate backups. Could that be a issue?

preferences for the PC

    "id": "MOTOCAT-Carl-Carl",
    "storage": "b2://MotoCat-Garage-PC-CARL",

preferences for V

    "id": "MOTOCAT-Carl-V",
    "storage": "b2://SynologyMotoNAS-VAULT",

preferences for W

    "id": "MOTOCAT-Carl-W",
    "storage": "b2://SynologyMotoNAS-SERVER",

The snapshots directory in each bucket look good. there is 1, and then higher numbers in the thousands, (as 2, 3 etc. have been pruned).

The list -all command still only shows the result for the one storage:

“C:\Program Files (x86)\Duplicacy\duplicacy_win_x64_2.1.0.exe” list -all
Storage set to b2://MotoCat-Garage-PC-CARL
Snapshot MOTOCAT-Carl-Carl revision 1 created at 2018-04-16 20:29 -hash
Snapshot MOTOCAT-Carl-Carl revision 39 created at 2018-04-23 21:00

Snapshot MOTOCAT-Carl-Carl revision 3899 created at 2019-09-19 12:00

Other ideas?

This explains why list -all only shows one at a time. You have three separate backup storages (buckets). Not a wrong configuration by any means, but if you want to take advantage of de-duplication, you should use the same storage URL / bucket - a single storage.

1 Like

The stuff I have on the separate dives is pretty unique… so I went for separate buckets when I set it up.

So…

  • How do I list the other storages for the other repositories?
  • Why is the check - list running far larger than the last rev should be, size wise?
    Again, I ran this:
C:\Users\Carl>“C:\Program Files (x86)\Duplicacy\duplicacy_win_x64_2.1.0.exe” check -r 3896 -files
Storage set to b2://Mxxxxxxxxxxxxxxxxxxxxxxxx
Listing all chunks

And it ran all night, downloading 750+GB, when the repository is only 225GB. And never finished, as I stopped it at that point.

PS: so at this point, While I wanted to check the repositories / storages first, now I am thinking I should just upgrade, so whatever I am doing it is on the current version. Any other comments to my last post before I do that?

Since you have seperate storages, you’ll need to run check -files and any other operations like prune, from each respective repository (“Carl”, “V” and “W”). Do this by cd'ing to each repo.

A list -all won’t show anything more than a basic list, because there’s only one repo in each storage.

Honestly, I don’t know. My only guess is that you have a lot of de-duplication going on - i.e. while the storage may only be 225GB, it’s re-downloading chunks that are used for many files.

What does a normal check -a -tabular (no -files) say about how big your repository is?

Currently, I think the most efficient way - in terms of least bandwidth - to check or restore a repository is to copy it locally and do it from that. Hmm.

Do you mean from the legacy GUI to the Web Edition GUI? I don’t think this will make a difference to your current issue but at least in the Web Edition you can add multiple storages and perform a regular check on a schedule.

In future, when dealing with the CLI, you may want to use duplicacy -v -log <command> so you get to see some progress on what it’s doing.

1 Like