Prune is very slow and doesn't have any progress bar, how long does it actually take and what is limiting the speed?

TyTro · 6 October 2022 05:50

Based on the discussion in the other thread (The "check" command is so slow that it can never finish. Is there anything that can be done to improve the speed of "check"? - #10 by Droolio) where I was talking about check being slow, I am now trying to follow the suggestion of running prune to make check faster.

But prune seems even slower than check, and while check at least has a progress bar, prune just shows me this, no matter how long I wait:

At first I tried running it with -all, but now I’m trying to run it with -id 2 in the hope that that’s faster. And based on the log it is faster, but still takes infinitely long with no way for me to know how long it will actually take, as there’s no indicator for what the progress is even with -d in the log.

This is the relevant parts of the log:

Running prune command from C:\ProgramData/.duplicacy-web/repositories/localhost/all
Options: [-log -d prune -storage BackupGoogle -keep 0:6000 -keep 2:60 -keep 1:14 -id 2 -threads 10]
2022-10-05 22:09:49.363 INFO STORAGE_SET Storage set to gcd://Duplicacy
2022-10-05 22:09:49.386 DEBUG PASSWORD_ENV_VAR Reading the environment variable DUPLICACY_BACKUPGOOGLE_GCD_TOKEN
2022-10-05 22:09:51.604 DEBUG PASSWORD_ENV_VAR Reading the environment variable DUPLICACY_BACKUPGOOGLE_PASSWORD
2022-10-05 22:09:52.898 TRACE CONFIG_ITERATIONS Using 16384 iterations for key derivation
2022-10-05 22:09:53.149 DEBUG STORAGE_NESTING Chunk read levels: [1], write level: 1
2022-10-05 22:09:53.150 INFO CONFIG_INFO Compression level: 100
2022-10-05 22:09:53.150 INFO CONFIG_INFO Average chunk size: 4194304
2022-10-05 22:09:53.150 INFO CONFIG_INFO Maximum chunk size: 16777216
2022-10-05 22:09:53.150 INFO CONFIG_INFO Minimum chunk size: 1048576
2022-10-05 22:09:53.150 INFO CONFIG_INFO Chunk seed: c92876c834715f71fd9bf48a8f49efb1cd30a0a146239c00be9064d1d444135f
2022-10-05 22:09:53.150 TRACE CONFIG_INFO Hash key: 4c3d9cb7dfba828fa36c6cadb4d058a6d2b736b94eb0180bcc95259b066c5ac4
2022-10-05 22:09:53.150 TRACE CONFIG_INFO ID key: e929621736c330c716551f9b3f5c639fe93cdc05579927ef7485e409904ebccb
2022-10-05 22:09:53.150 TRACE CONFIG_INFO File chunks are encrypted
2022-10-05 22:09:53.150 TRACE CONFIG_INFO Metadata chunks are encrypted
2022-10-05 22:09:53.150 DEBUG DELETE_PARAMETERS id: 2, revisions: [], tags: [], retentions: [0:6000 2:60 1:14], exhaustive: false, exclusive: false, dryrun: false, deleteOnly: false, collectOnly: false
2022-10-05 22:09:53.151 INFO RETENTION_POLICY Keep no snapshots older than 6000 days
2022-10-05 22:09:53.151 INFO RETENTION_POLICY Keep 1 snapshot every 2 day(s) if older than 60 day(s)
2022-10-05 22:09:53.151 INFO RETENTION_POLICY Keep 1 snapshot every 1 day(s) if older than 14 day(s)
2022-10-05 22:09:53.151 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
2022-10-05 22:09:53.575 TRACE SNAPSHOT_LIST_REVISIONS Listing revisions for snapshot 4
2022-10-05 22:13:08.432 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/4/1 from the snapshot cache

[a lot more of those, going up revisions]

2022-10-05 22:56:54.863 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/4/7915 from the snapshot cache
2022-10-05 22:56:54.863 TRACE SNAPSHOT_LIST_REVISIONS Listing revisions for snapshot 3
2022-10-05 22:58:50.616 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/3/1 from the snapshot cache

[a lot more of those, going up revisions]

2022-10-06 01:05:32.682 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/1/12594 from the snapshot cache
2022-10-06 01:05:34.290 DEBUG DOWNLOAD_FILE Downloaded file snapshots/1/12595
2022-10-06 01:05:34.292 DEBUG SNAPSHOT_DELETE Snapshot 2 at revision 3 to be deleted - older than 60 days, less than 2 days from previous

[a lot more of those, going up revisions]

2022-10-06 01:05:34.343 DEBUG SNAPSHOT_DELETE Snapshot 2 at revision 8639 to be deleted - older than 14 days, less than 1 days from previous
2022-10-06 01:05:34.343 INFO SNAPSHOT_DELETE Deleting snapshot 2 at revision 3
2022-10-06 01:05:34.359 DEBUG CHUNK_CACHE Chunk 5385423b401884c644ec49823732d168fc5e373b1b3b4ff8509f5a6e72661a86 has been loaded from the snapshot cache
2022-10-06 01:05:34.528 INFO SNAPSHOT_DELETE Deleting snapshot 2 at revision 4

[a lot more of those, going up revisions]

2022-10-06 01:29:49.252 DEBUG DOWNLOAD_FETCH Fetching chunk 39367b61cf9614eed24935da307fa6e29298f4f929d41830b3d97a49d4de2177
2022-10-06 01:29:49.254 DEBUG CHUNK_CACHE Chunk 39367b61cf9614eed24935da307fa6e29298f4f929d41830b3d97a49d4de2177 has been loaded from the snapshot cache
2022-10-06 01:29:49.467 INFO SNAPSHOT_DELETE Deleting snapshot 2 at revision 8639
2022-10-06 01:29:49.467 DEBUG DOWNLOAD_FETCH Fetching chunk 59999e7fd7629d3142415dfe1ae8b395f4f039898b2558b26ba5aafc3cdc5d40
2022-10-06 01:29:49.473 DEBUG CHUNK_CACHE Chunk 59999e7fd7629d3142415dfe1ae8b395f4f039898b2558b26ba5aafc3cdc5d40 has been loaded from the snapshot cache
2022-10-06 01:29:49.474 DEBUG DOWNLOAD_FETCH Fetching chunk 39367b61cf9614eed24935da307fa6e29298f4f929d41830b3d97a49d4de2177
2022-10-06 01:29:49.476 DEBUG CHUNK_CACHE Chunk 39367b61cf9614eed24935da307fa6e29298f4f929d41830b3d97a49d4de2177 has been loaded from the snapshot cache
2022-10-06 01:29:49.637 DEBUG DOWNLOAD_FETCH Fetching chunk cd26258898b4ccaad4e619c14dfb3bcacfedc3c1589bae9737bb0764d523ee4b
2022-10-06 01:29:49.651 DEBUG CHUNK_CACHE Chunk cd26258898b4ccaad4e619c14dfb3bcacfedc3c1589bae9737bb0764d523ee4b has been loaded from the snapshot cache
2022-10-06 01:29:49.652 DEBUG DOWNLOAD_FETCH Fetching chunk 21b7e146caa53407e977513da485c429b953dbcc5d0bbe478584db232fa33292
2022-10-06 01:29:49.683 DEBUG CHUNK_CACHE Chunk 21b7e146caa53407e977513da485c429b953dbcc5d0bbe478584db232fa33292 has been loaded from the snapshot cache
2022-10-06 01:29:49.685 DEBUG DOWNLOAD_FETCH Fetching chunk 7e00420080dd8fc76f0d8c0bbf29d922996dc6c97698e9efe71e3bca13cc3d6b
2022-10-06 01:29:49.691 DEBUG CHUNK_CACHE Chunk 7e00420080dd8fc76f0d8c0bbf29d922996dc6c97698e9efe71e3bca13cc3d6b has been loaded from the snapshot cache
2022-10-06 01:29:49.703 DEBUG DOWNLOAD_FETCH Fetching chunk dbfd1e444503131daf343647fe6d3514febbbccbee2728de404501628f68c1ba
2022-10-06 01:29:49.710 DEBUG CHUNK_CACHE Chunk dbfd1e444503131daf343647fe6d3514febbbccbee2728de404501628f68c1ba has been loaded from the snapshot cache

[and here now begins the real problem in speed, this goes on forever]

So at the beginning, there is a reasonably fine progress indicator in the log as it always counts up revisions, does something at the revision, like fetching a chunk, and then repeats the same for the next revision, etc, until it reached the highest number revision. But after 3.5 hours, it’s finished with that and no longer counts up revisions, and instead just infinitely repeats Fetching chunk and chunk has been loaded from the snapshot cache.

What is it actually doing there? Looking at the task manager at what Duplicacy is doing, I see that it’s using around ~6% CPU usage which equals ~2 threads on my 3950X, and somewhere between 0.1 MB/s and 4 MB/s in disk usage, and 0 MB/s network usage. So it seems whatever it’s doing here is fully local, not involving downloading or uploading anything on the network, just looking at local files. But why is it so extremely slow then? The cache folder is on my NVMe SSD that can do around 7000 MB/s, so I don’t understand why duplicacy is working on this at a snails pace.

Droolio · 6 October 2022 11:23

The main problem here is likely the sheer number of revisions it has to deal with, and until you’ve got that number down, prune or check or any maintenance operation will be painfully slow (remember when I mentioned it’s closer to exponential than linear? )…

In your position normally I might consider using the -exclusive flag - making sure not to run backups or do anything else with the storage til it’s done. It’s probably faster than non-exclusive, where chunk files are renamed to fossils one-by-one instead of deleted outright. The slight problem with this approach is that aborting the prune could leave the storage in a state where chunks were deleted, but snapshot revisions weren’t (some of us have suggested it should do it the other way around, or to fossilise snapshot files as well as chunks). Strictly speaking, you haven’t lost any data that wasn’t intended to be deleted, but a check will claim there are missing chunks, and you’d have to manually deleted a bunch of a snapshot revisions before progressing. Plus a further prune may delete a whole bunch more chunks, and abort when it runs into those missing chunks, making the problem worse each time. All this to say, since there’s a high chance of abort, maybe avoid -exclusive.

Alternatively, it might be best to break down your retention policy and process them one by one - i.e. only do -keep 1:14 to remove most of the hourly snapshots. Perhaps combine this with -collect-only (and deal with actual deletions later) and I’d strongly recommend replacing your -id back to -a and process everything at once, since I believe it needs to load all the snapshot revisions anyway, to determine which chunks should be safely pruned. (I could be wrong, but I just don’t think it’ll save time.)

Another idea might be to set -keep 1:x to a high number and cull the storage bit by bit - i.e. start at -keep 1:3000 and lower it to 1000 then 14. This might run into the same issue as -id, so it’s up to you to see if combining them might be faster, honestly it’d be interesting to find out.

Once you’ve managed to complete -keep 1:14, future prunes should run in a reasonable amount of time.

TL;DR - try -collect-only -keep 1:3000 -a

TyTro · 9 October 2022 07:04

Thanks!

-collect-only makes it slightly faster, so that’s good. Specifiying an -id definitely makes it much faster than using -all, so I do need to specify an ID.

What I don’t quite understand is what I need to do after successfully having run with -collect-only? I succeeded the prune on ID 3 with collect only and a keep 1:14. I assumed after running with collect only, I need to run it additionally once with delete only. So I’ve now tried to run it with -delete-only on ID 3, but that does not seem to work, it doesn’t seem to do anything, so is that not what I am supposed to do?

Running prune command from C:\ProgramData/.duplicacy-web/repositories/localhost/all
Options: [-log -d prune -storage BackupGoogle -id 3 -threads 10 -delete-only]
2022-10-09 07:17:30.001 INFO STORAGE_SET Storage set to gcd://Duplicacy
2022-10-09 07:17:30.023 DEBUG PASSWORD_ENV_VAR Reading the environment variable DUPLICACY_BACKUPGOOGLE_GCD_TOKEN
2022-10-09 07:17:31.780 DEBUG PASSWORD_ENV_VAR Reading the environment variable DUPLICACY_BACKUPGOOGLE_PASSWORD
2022-10-09 07:17:32.988 TRACE CONFIG_ITERATIONS Using 16384 iterations for key derivation
2022-10-09 07:17:33.215 DEBUG STORAGE_NESTING Chunk read levels: [1], write level: 1
2022-10-09 07:17:33.215 INFO CONFIG_INFO Compression level: 100
2022-10-09 07:17:33.216 INFO CONFIG_INFO Average chunk size: 4194304
2022-10-09 07:17:33.216 INFO CONFIG_INFO Maximum chunk size: 16777216
2022-10-09 07:17:33.216 INFO CONFIG_INFO Minimum chunk size: 1048576
2022-10-09 07:17:33.216 INFO CONFIG_INFO Chunk seed: c92876c834715f71fd9bf48a8f49efb1cd30a0a146239c00be9064d1d444135f
2022-10-09 07:17:33.216 TRACE CONFIG_INFO Hash key: 4c3d9cb7dfba828fa36c6cadb4d058a6d2b736b94eb0180bcc95259b066c5ac4
2022-10-09 07:17:33.216 TRACE CONFIG_INFO ID key: e929621736c330c716551f9b3f5c639fe93cdc05579927ef7485e409904ebccb
2022-10-09 07:17:33.216 TRACE CONFIG_INFO File chunks are encrypted
2022-10-09 07:17:33.216 TRACE CONFIG_INFO Metadata chunks are encrypted
2022-10-09 07:17:33.216 DEBUG DELETE_PARAMETERS id: 3, revisions: [], tags: [], retentions: [], exhaustive: false, exclusive: false, dryrun: false, deleteOnly: true, collectOnly: false
2022-10-09 07:17:33.217 TRACE SNAPSHOT_LIST_IDS Listing all snapshot ids
2022-10-09 07:17:33.681 TRACE SNAPSHOT_LIST_REVISIONS Listing revisions for snapshot 4
2022-10-09 07:20:26.425 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/4/1 from the snapshot cache
2022-10-09 07:20:26.688 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/4/2 from the snapshot cache
2022-10-09 07:20:26.936 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/4/3 from the snapshot cache
2022-10-09 07:20:27.305 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/4/4 from the snapshot cache
2022-10-09 07:20:27.566 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/4/5 from the snapshot cache
[...]
2022-10-09 08:56:35.702 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/1/12598 from the snapshot cache
2022-10-09 08:56:35.978 DEBUG DOWNLOAD_FILE_CACHE Loaded file snapshots/1/12599 from the snapshot cache
2022-10-09 08:56:37.253 DEBUG DOWNLOAD_FILE Downloaded file snapshots/1/12600
2022-10-09 08:56:38.525 DEBUG DOWNLOAD_FILE Downloaded file snapshots/1/12601
2022-10-09 08:56:38.535 INFO FOSSIL_COLLECT Fossil collection 1 found
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 2 revision 8812 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 2 revision 8813 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_NO_NEW No new snapshot from 2 since the fossil collection step
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 1 revision 12600 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 1 revision 12601 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_NO_NEW No new snapshot from 1 since the fossil collection step
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 4 revision 7919 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 4 revision 7920 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_NO_NEW No new snapshot from 4 since the fossil collection step
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 3 revision 7923 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_UNDELETABLE New snapshot 3 revision 7924 doesn't meet the fossil deletion criteria
2022-10-09 08:56:38.535 TRACE SNAPSHOT_NO_NEW No new snapshot from 3 since the fossil collection step
2022-10-09 08:56:38.535 INFO FOSSIL_POSTPONE Fossils from collection 1 can't be deleted because deletion criteria aren't met

Droolio · 9 October 2022 15:06

The deletion phase of the two-step prune algorithm will only remove chunks if you’ve successfully run backups for all IDs since the collect phase. So if you run backups and try again, it should definitely remove stuff.

TyTro · 26 October 2022 02:53

So, after a few weeks, I’ve managed to succeed a prune with -keep 1:14 now!

I had to do it slowly, one ID after the other, and the last ID had a lot of stuff that needed to be deleted so those I had to do slowly by first doing a -keep 1:500, then a -keep 1:470, then a -keep 1:430 etc, that worked. In the first few weeks I could only run it with -threads 10 because that was roughly where I run into rate limits of Google Drive, but a few days ago I could go up to -threads 100 thanks to the new Duplicacy version supporting the custom Google Drive account with less rate limits, so then it became much faster. So if I had been able to use -threads 100 directly from the start, it wouldn’t have taken me a few weeks to finish prune, but only a few days.

I was always running it with -collect-only to have it be faster, so I did have a lot of fossilized chunks at the end. And even after finishing the prune and running all backups, Duplicacy was saying that none of the fossils meet the deletion criteria, so to get rid of the fossils I had to run it with -exclusive, and I made sure of course that I didn’t run anything else while that was running.

But I do have the issue now that you already suggested could happen, that check claims there are missing chunks, in a total of 9 revisions:

	Line 3507: 2022-10-26 04:02:04.956 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12486 are missing
	Line 3527: 2022-10-26 04:02:10.385 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12488 are missing
	Line 3571: 2022-10-26 04:02:21.508 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12489 are missing
	Line 3620: 2022-10-26 04:02:33.498 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12490 are missing
	Line 3670: 2022-10-26 04:02:45.559 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12491 are missing
	Line 3718: 2022-10-26 04:02:56.800 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12492 are missing
	Line 3766: 2022-10-26 04:03:07.737 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12493 are missing
	Line 3814: 2022-10-26 04:03:19.164 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12494 are missing
	Line 3848: 2022-10-26 04:03:27.140 WARN SNAPSHOT_CHECK Some chunks referenced by snapshot 1 at revision 12495 are missing
2022-10-26 04:05:15.909 INFO SNAPSHOT_CHECK All chunks referenced by snapshot 1 at revision 12627 exist
2022-10-26 04:05:15.909 ERROR SNAPSHOT_CHECK Some chunks referenced by some snapshots do not exist in the storage
Some chunks referenced by some snapshots do not exist in the storage

So that seems to be the last thing I somehow need to fix. How can I do that?

Droolio · 26 October 2022 03:51

Since you have a snapshot (12627) more recent than the snapshots with missing chunks, maybe the best and easiest option would be decide if you can live without them - and simply manually delete those snapshot files, and clean-up with -exhaustive later.

It’s interesting that those numbers are consecutive, which may suggest the -keep 1:X didn’t entirely complete somehow? Coz if you’re running hourly, there should be gaps of 24 or so.

Or maybe they’re more recent backups since the last prune - and perhaps due to the inventory difference between start and finish of the run.

Alternatively, there might be a way to regenerate those chunks - by creating a new backup ID and running it on the same repo, it’d only upload chunks that aren’t already on the storage, filling in the missing bits. However, this may not work, since 12627 no longer reference those chunks. While you could try restore your repo (or a temporary copy) closer back in time to what it was when those snapshots were created, and run a fresh backup, it’s probably a lot more hassle than it’s worth. It’s my experience a short run of bad snapshots is just temporary and best dealt with manual deletion. (Which is not to say this happens very often. )

TyTro · 26 October 2022 03:56

I am sure I don’t need those 9 specific revisions, so yeah, I’m completely fine with deleting them. But how exactly can I do that? By just running prune with -r X -r Y etc?

And is there a reason why I should run -exhaustive after I’ve managed to delete them? I don’t mind a bunch of unused chunks being there if they don’t hurt anything.

Droolio · 26 October 2022 11:52

I believe attempting to prune those revisions would result in more missing chunk errors, ultimately, even if it would delete those snapshot files first - it’d work, and subsequent check probably wouldn’t find anything wrong from then on. You’d have to use something like -r 12486 -r12488-12495 to do it in one command.

But it really it just easier in most cases to find the revision files on the destination storage, via Google Drive web interface - under snapshots/<snapshot_ID> and delete the numbered files i.e. 12486, 12488 etc…

You don’t have to run prune -exhaustive, but you can run that command alone, without the added complications of -exclusive and eventually those unreferenced chunks will get cleaned up.

TyTro · 26 October 2022 17:12

Thanks, I manually deleted the 9 snapshot files in the Google Drive web interface, and it worked! Was able to succeed a check on all IDs now.