Missing chunks after running prune

johannp02180 · 1 December 2021 16:14

Ok i followed the guide and I think it fixed most things but when I rerun prune

It has gotten stuck and when I click on the loading status for the log
I see this: 2021-12-01 01:43:34.208 INFO SNAPSHOT_DELETE Deleting snapshot xxxxx at revision 7

It seems to be stuck deleting this snapshot

saspus · 1 December 2021 16:48

It may take a very long time, specifically with Google Drive. I have seen it spend 48 hours pruning the 2 TB dataset.

If you want to see detailed progress (i.e., skipped chunk X, deleted chunk Y, skipped chunk Z…) – add -d flag to the global options.

But no reason to. It’s not stuck, it’s doing its job. It will take a while.

johannp02180 · 2 December 2021 01:45

Hmm, its still stuck there its been like that for 19 hours or so, its a small backup as well, <100GB folder.

If I stop the prune, and start it again does that break things?

johannp02180 · 2 December 2021 02:17

I’m using Google Workspace by the way, should I be creating my own project for the gcd file? I have been using the default way to login.

Droolio · 2 December 2021 03:10

Won’t break things in the sense of it deleting data that wasn’t meant to be deleted anyway… you may have to manually delete snapshots files it never got around to removing. Lest they show up as missing chunks.

BTW the Web UI isn’t great at tailing logs so you might want to examine them directly in .duplicacy-web\logs with a text viewer.

In your shoes I might abort (ensure all processes are killed) and add the -d (or -v) global option but add -threads 8 to speed up proceedings.

johannp02180 · 2 December 2021 03:47

Like this?

It says invalid options when I try to start it. Looking at Prune command details it says that -d is dryrun? But saspus mentioned its for detailed progress?

johannp02180 · 2 December 2021 03:51

Oh ok I got it working, i moved the threads to the command options. I’ll see how it goes and I’ll follow the log as it goes. Thanks for the help so far!

johannp02180 · 2 December 2021 04:25

I keep hitting the rate limit: 2021-12-01 23:24:35.018 DEBUG GCD_RETRY [5] User Rate Limit Exceeded. Rate of requests for user exceed configured project quota. You may consider re-evaluating expected per-user traffic to the API and adjust project quota limits accordingly. You may monitor aggregate quota usage and adjust limits in the API Console: ; retrying after 2.80 seconds (backoff: 2, attempts: 1)

It would hit it, retry a few times and continue and hit it again. I have google workspace, would creating my own project help with this limit? What is the API limit anyways?

saspus · 2 December 2021 07:34

Google drive tolerates 4. I would (and did) set it to one.

You can generate your own credentials, but then you also need to maintain a service for token renewals that for tokens issued by duplicacy.com/gcd_start is currently hosted on duplicacy.com/gcd_renew.

Alternatively, to avoid periodic renewals (and dependency on duplicacy.com being up) you can configure access via service account and impersonation. I did just that myself, but it requires a bit of manual work: Duplicacy backup to Google Drive with Service Account | Trinkets, Odds, and Ends

On a separate note, Prune must be way more verbose and transparent: duplicacy knows how many files to delete, so why is there no progress? Also, hitting API rate limit shall be logged as warning at default logging level.

johannp02180 · 2 December 2021 21:57

The guide is great, I will be using that I think. Is it possible to move my current duplciacy backups into the appdata hidden datastore? I’m not sure if that is what you’re doing as well.

Yeah I wish duplicacy was more verbose as well, and I wish the web UI has more features and options.

saspus · 2 December 2021 22:04

I don’t think you can move the datastore directly server side — two different scope are required for accessing app data folder and users drive.

But I guess you can run rclone with the same token in the google cloud instance and copy duplicacy datastore to avoid roubdtip through your home connection, if that is a bottleneck.

johannp02180 · 2 December 2021 22:12

Ah ok, curious by the way, did you make the guide? Its pretty thorough!

I am however seeing this, and I have authorized the client id in the admin panel to allow domain-wide delegation but I don’t see a check box in the service account menu.

Has duplicacy been updated to honor the subject and scope fields yet?

johannp02180 · 2 December 2021 22:13

Also when I download the JSON credential files, the scope isn’t included in the file, and the guide only says to add the subject line.

Also thanks for the help so far! I appreciate it!

saspus · 2 December 2021 22:47

Yep. At that time I switched blog to a different CDN, and went overboard with large screenshots to see how it would work

This is a warning, and it’s a correct warning. Service accounts can do anything. With great power comes great responsibility.

Not sure what checkbox do you refer to.

The PR was merged, so if you build top of tree you’ll have it. But is is not in the released binary as of today.

Building duplicacy however with modern go will fail. You’ll need to do this to address it: Building recent version fails with "Context.App.Writer undefined (type *cli.App has no field or method Writer)"

johannp02180 · 3 December 2021 05:41

The prune just failed, i got this in the logs

saspus · 3 December 2021 05:56

Reduce number of threads to 1. There shall be no rate limit exceeding stuff. It just slows everything down
Was there interrupted prune before? If so, missing chunks belong to ghost snapshots.

johannp02180 · 3 December 2021 06:41

Yeah I have cancelled it in before this prune cause I wanted to use -d. Ok I’ll adjust my threads.

johannp02180 · 3 December 2021 08:05

Should I delete my cache folder, change my threads to 1 and rerun the prune?

johannp02180 · 10 December 2021 14:08

Well that took too long, but after my 1 week prune it just failed with:
“2021-12-10 05:02:51.420 ERROR CHUNK_DELETE Failed to fossilize the chunk f481a22fd832c521169642bd1f21682edcfdd50e9c3ea5fea1455f7a97007865: Failed to retrieve the id of ‘chunks/f4/81a22fd832c521169642bd1f21682edcfdd50e9c3ea5fea1455f7a97007865’: googleapi: Error 400: Bad Request, failedPrecondition
Failed to fossilize the chunk f481a22fd832c521169642bd1f21682edcfdd50e9c3ea5fea1455f7a97007865: Failed to retrieve the id of ‘chunks/f4/81a22fd832c521169642bd1f21682edcfdd50e9c3ea5fea1455f7a97007865’: googleapi: Error 400: Bad Request, failedPrecondition”

Any ideas?

gchen · 11 December 2021 03:52

I believe this failedPrecondition error is caused by exceeding the rate limit. It is unusual for the prune job to take more than a week. How large is your storage?

Now that the prune job has failed, you’ll need to run a check job with the -fossils option to make sure that there are no missing chunks/fossils. After than I would suggest running the prune with -exclusive without running any backup jobs at the same time. With -exclusive, unreferenced chunks/fossils are removed immediately, which should be much faster than the rename operation.