Pruning does not appear to be removing fossils

Setup:

Backup server with storage and locally mounted repositories.

  1. cd repository1
  2. duplicacy backup
  3. duplicacy prune -id respostory1 -keep 1:1
  4. duplicacy prune -id respostory1 -keep 0:40

This is repeated nightly. I don’t necessarily know how many repositories there are per storage, there may be more than one though in this case there is only one.

I see lots of Fossil collection 1 saved output in the logs, but I am not seeing anything indicating that fossils are being removed. From what I read, there may be reasons for this due to the way prune is implemented. The storage has 16k fossils, and the last prune suggested it fossilised 200 chunks, so I would expect storage to have a much smaller number of fossils, ie 200 after the prune.

Prune reference:

Why is prune in this case not removing fossils?

edit attempted the following

duplicacy backup
duplicacy -d prune -a -delete-only
duplicacy prune -id ABC123 -keep 1:1 
duplicacy prune -id ABC123 -keep 0:40

Despite there being 16k fossils, only 1 snapshot, the delete-only removes no fossils.

I eventually used the following command to clean up these fossils. If necessary I can put this in the backup script.

 duplicacy -d prune -exhaustive -exclusive

Do you have any other snapshot IDs in the storage?

Chunk deletion only occurs when all the repositories have run a successful backup, after a collection pass. This is to ensure all those chunks are no longer in use by any other repository.

In this case No, it is the ONLY snapshot ID in the storage, which is why it is particularly strange.

I do have other storages where there are multiple snapshot IDs which will also be an issue, given the nightly script (a summary, real script is config driven):

cd repo1
duplicacy backup
duplicacy prune -id repo1 -keep 1:1
duplicacy prune -id repo1 -keep 0:40
cd repo2
duplicacy backup
duplicacy prune -id repo2 -keep 1:1
duplicacy prune -id repo2 -keep 0:40

that will never remove fossils, it would have to be resequenced

cd repo1
duplicacy backup
cd repo2
duplicacy backup
cd repo1
duplicacy prune -id repo1 -keep 1:1
duplicacy prune -id repo1 -keep 0:40
cd repo2
duplicacy prune -id repo2 -keep 1:1
duplicacy prune -id repo2 -keep 0:40

Because these backups are config driven (its not just a static script) by snapshot ID each snapshot is processed in turn, hence why the prune happens after the snapshot backup.

It would be nice if there was an -ignore-all option, to force fossil collection during prune.

For now, I have resorted to using -exclusive which is ok as the backup script has its own locking mechanism to prevent multiple backups from running.

Hmm, I’m wondering if specifying the -id during prune is causing the deletion phase to be skipped?

Normally, for efficiency, I would say the best way to run prune would be with the -all flag. Obviously, that means you can’t have different retention periods - but if they’re all the same anyway…

Also, I wonder why you aren’t merging your retention periods together, like so:

duplicacy prune -id repo2 -keep 0:40 -keep 1:1

Personally, I would separate out your prune operation from the backup jobs, and perhaps run it less frequently, like once a day or once a week.

Try the -all flag instead of specifying -ids, to see if it has any effect on whether those fossils get removed.

1 Like

The only reasons I didn’t merge keeps into one prune operation is

a) lazyness (easier to foreach than to write a join)
b) documentation says its a valid way to do it

As for running prunes separately, it’s not really convenient.

In my case, I can work around this problem with exclusive mode. The backup script is protected by a lock, so there won’t ever be two backups running at the same time.

Specifically I now do, after each backup set backup

duplicacy backup
some verify and lists ops to verify integrity
duplicacy prune -exhaustive -exclusive
for each retention
  duplicacy prune -id $reponame --keep $retention

The first duplicacy prune command cleans up any previously left fossils.

@austin.france, if you run duplicacy -d prune it should tell you why fossils aren’t removed in the log messages.