wow, I wave the same problem with wasabi.
Check operations take the longest to complete.
Almost 1h to check 500 GB.
@gchen could we implement multiple threads to check s3 storage as well?
The s3 backend can list the chunks directory recursively and it is already very efficient, so there is no need to use multiple threads.
On my 235G wasabi storage it only take 23 seconds to list 48K chunks:
2020-06-15 21:00:08.600 TRACE LIST_FILES Listing chunks/
2020-06-15 21:00:31.444 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
...
2020-06-15 21:00:35.822 INFO SNAPSHOT_CHECK Total chunk size is 235,363M in 47917 chunks
Thatās strange,
For me with about 500 GB it takes 30 mins to list 98451 chunks and 30 additional minutes to check all snapshots and revisions.
So a total of 1h.
what could be wrong @gchen?.
Just ran check -d:
2020-06-17 18:46:55.515 TRACE LIST_FILES Listing chunks/
2020-06-17 18:47:39.364 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
2020-06-17 19:18:40.592 INFO SNAPSHOT_CHECK Total chunk size is 427,554M in 105613 chunks
It takes 31 minutes. for 432 GB
What am I doing wrong?
Maybe nothing. Which region are you using?
As another wasabi data point (us-west-1 region, -log check -fossils -resurrect -a -tabular
), listing all chunks takes me a little more than 4 minutes for ~370000 chunks (~1.6 TB). The rest of the check operation takes ~8 minutes.
Hmm the time between āListing chunksā and āListing all snapshot idsā in your log is 44 seconds.
How many snapshots does your storage have? Coz my recent GCD check, with -d
, has a line in there about this, before the āTotal chunk sizeā bit:
2020-06-16 15:48:30.272 INFO SNAPSHOT_CHECK Listing all chunks
2020-06-16 16:41:00.007 INFO SNAPSHOT_CHECK 13 snapshots and 1910 revisions
2020-06-16 16:41:00.011 INFO SNAPSHOT_CHECK Total chunk size is 1251G in 320761 chunks
Iām using wasabi us-east-1
2020-06-17 18:46:55.515 TRACE LIST_FILES Listing chunks/
2020-06-17 18:47:39.364 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
2020-06-17 19:18:40.589 INFO SNAPSHOT_CHECK 13 snapshots and 16301 revisions
2020-06-17 19:18:40.592 INFO SNAPSHOT_CHECK Total chunk size is 427,554M in 105613 chunks
Maybe itās the number of revisions.
But that would only affects the second stage and listing still takes 31 minutes.
Yea, listing snapshots, not chunks. The optimisation above, for Google Drive, S3 et al, is for listing chunks - which was very time-consuming.
As far as I can see, that doesnāt appear to be your problem here, as yours is done in ~44 seconds, before going onto listing snapshots - that seems to be the time-consuming part, which is rather weird.
Between these 2 lines, Duplicacy loaded all revisions one by one and made sure that every referenced chunk was in the list of existing chunks. Because you have such a large number of revisions (16302 of them), 31 minutes is very reasonable.
Fair enough.
The truth is I have never pruned or deleted any snapshots.
I think I can delete lots snapshots
So itās time to prune a bit.
Thank you all.
My problem is that the prune command is a bit limited in the web-ui
I have posted a new topic with a web-ui UX request for custom cli commands.
Actually it may seem limited, but itās really not. You can just ignore the proposed retention options and paste the full options into the āOptionsā field.
And when editing a prune job, you anyway have only the option field to deal with.
Interesting, so it would be something like:
-id test-id -keep 30:360 -keep 7:180 -keep 1:30 -threads 8
again -does the parameter id work for example?.
Are there any restrictions on cli options?
If so we should really change the descriptions or the web ui guide.
Will try. But I didnāt wanna guess with prune.
Thank you
Yes, something like that. The options are just passed to the CLI, so I donāt see why -id wouldnāt work.
Well, you can always use the -dry-run option. Assuming that one works, of course
Exactly, that is precisely the question.
Yes, -dry-run will work as an option.
Iāll release a new CLI version next week.
I can confirm this latest version fixes my original issue. Listing all chunks now runs in 15 mins on 15.5TB of data on GCD with 10 threads. Thank you!
2020-07-06 01:00:01.698 INFO STORAGE_SET Storage set to gcd://backup
2020-07-06 01:00:04.038 INFO SNAPSHOT_CHECK Listing all chunks
2020-07-06 01:15:08.902 INFO SNAPSHOT_CHECK 6 snapshots and 114 revisions
2020-07-06 01:15:09.034 INFO SNAPSHOT_CHECK Total chunk size is 15548G in 3243443 chunks
This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.