Check command - is this execution time typical/expected?

wow, I wave the same problem with wasabi.
Check operations take the longest to complete.
Almost 1h to check 500 GB.
@gchen could we implement multiple threads to check s3 storage as well? :heart_eyes:

The s3 backend can list the chunks directory recursively and it is already very efficient, so there is no need to use multiple threads.

On my 235G wasabi storage it only take 23 seconds to list 48K chunks:

2020-06-15 21:00:08.600 TRACE LIST_FILES Listing chunks/
2020-06-15 21:00:31.444 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
...
2020-06-15 21:00:35.822 INFO SNAPSHOT_CHECK Total chunk size is 235,363M in 47917 chunks

Thatā€™s strange,
For me with about 500 GB it takes 30 mins to list 98451 chunks and 30 additional minutes to check all snapshots and revisions.
So a total of 1h.
what could be wrong @gchen?.

Just ran check -d:

2020-06-17 18:46:55.515 TRACE LIST_FILES Listing chunks/
2020-06-17 18:47:39.364 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
2020-06-17 19:18:40.592 INFO SNAPSHOT_CHECK Total chunk size is 427,554M in 105613 chunks

It takes 31 minutes. for 432 GB

What am I doing wrong?

Maybe nothing. Which region are you using?

As another wasabi data point (us-west-1 region, -log check -fossils -resurrect -a -tabular), listing all chunks takes me a little more than 4 minutes for ~370000 chunks (~1.6 TB). The rest of the check operation takes ~8 minutes.

Hmm the time between ā€˜Listing chunksā€™ and ā€˜Listing all snapshot idsā€™ in your log is 44 seconds.

How many snapshots does your storage have? Coz my recent GCD check, with -d, has a line in there about this, before the ā€˜Total chunk sizeā€™ bit:

2020-06-16 15:48:30.272 INFO SNAPSHOT_CHECK Listing all chunks
2020-06-16 16:41:00.007 INFO SNAPSHOT_CHECK 13 snapshots and 1910 revisions
2020-06-16 16:41:00.011 INFO SNAPSHOT_CHECK Total chunk size is 1251G in 320761 chunks

Iā€™m using wasabi us-east-1

2020-06-17 18:46:55.515 TRACE LIST_FILES Listing chunks/
2020-06-17 18:47:39.364 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
2020-06-17 19:18:40.589 INFO SNAPSHOT_CHECK 13 snapshots and 16301 revisions
2020-06-17 19:18:40.592 INFO SNAPSHOT_CHECK Total chunk size is 427,554M in 105613 chunks

Maybe itā€™s the number of revisions.
But that would only affects the second stage and listing still takes 31 minutes.

Yea, listing snapshots, not chunks. The optimisation above, for Google Drive, S3 et al, is for listing chunks - which was very time-consuming.

As far as I can see, that doesnā€™t appear to be your problem here, as yours is done in ~44 seconds, before going onto listing snapshots - that seems to be the time-consuming part, which is rather weird.

Between these 2 lines, Duplicacy loaded all revisions one by one and made sure that every referenced chunk was in the list of existing chunks. Because you have such a large number of revisions (16302 of them), 31 minutes is very reasonable.

Fair enough.

The truth is I have never pruned or deleted any snapshots. :innocent:
I think I can delete lots snapshots
So itā€™s time to prune a bit.
Thank you all.

My problem is that the prune command is a bit limited in the web-ui
I have posted a new topic with a web-ui UX request for custom cli commands.

Render and customize cli-command to run

1 Like

Actually it may seem limited, but itā€™s really not. You can just ignore the proposed retention options and paste the full options into the ā€œOptionsā€ field.

And when editing a prune job, you anyway have only the option field to deal with.

1 Like

Interesting, so it would be something like:

-id test-id -keep 30:360 -keep 7:180 -keep 1:30 -threads 8

again -does the parameter id work for example?.

Are there any restrictions on cli options?

If so we should really change the descriptions or the web ui guide.

Will try. But I didnā€™t wanna guess with prune. :sweat_smile:

Thank you

1 Like

Yes, something like that. The options are just passed to the CLI, so I donā€™t see why -id wouldnā€™t work.

Well, you can always use the -dry-run option. Assuming that one works, of course :wink:

1 Like

Exactly, that is precisely the question. :rofl: :rofl: :rofl:

Yes, -dry-run will work as an option.

@gchen when is the next release that will include this commit planned for?

Iā€™ll release a new CLI version next week.

3 Likes

I can confirm this latest version fixes my original issue. Listing all chunks now runs in 15 mins on 15.5TB of data on GCD with 10 threads. Thank you!

2020-07-06 01:00:01.698 INFO STORAGE_SET Storage set to gcd://backup
2020-07-06 01:00:04.038 INFO SNAPSHOT_CHECK Listing all chunks
2020-07-06 01:15:08.902 INFO SNAPSHOT_CHECK 6 snapshots and 114 revisions
2020-07-06 01:15:09.034 INFO SNAPSHOT_CHECK Total chunk size is 15548G in 3243443 chunks

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.