Is that delete workflow right?

Hi everybody!

I spent 3 days with reading all the content available in the internet. I also did some tests by myself. I just want to ask you that I understood and interpreted it correctly. I’m using the Web GUI
Here are my questions relating to the following situation:
Imagine I created a backup with the id called Bernd. I backuped 3 times. So I created 3 snapshots (revisions). They have the numbers r 1, 2 and 3.

  1. To delete the second snapshot I create a schedule in the Web GUI for prune with the options -r 2 -id Bernd. This removes the second snapshot. But as I understood it removes it just from the dropdown menu in the restore tab. Why? I look into my storage I found out that the same space was still occupied. I read a bit and look into the folders on the storage and I found out that the data itself seems to be stored in chunks. Then I found out that I need to run prune -exhaustive. To remove these chunks. After this the storage was still occupied. I looked into the folders and saw that I got now some fossils. What are these fossils? Are these just the chunks which has been moved to the fossils folder? What’s the idea of this step? Is it like a trash bin? I mean the connection to the revision is already los. Then I found out that I need to run prune -delete-only to get rid of the fossils.

  2. To delete the whole Backup id Bernd, the only way to do that is to go into the folder and delete it manually right? But after this I still need to run -exhaustive and after this prune -delete-only to delete the whole backup with the id bernd and all its revisions.

  3. Can I just go to the fossils folder in the storage and delete everything inside instead of prune -delete-only ?

  4. Why do I need to schedule these prompts? There is the danger that a prune prompt is triggered without wanting it to be triggered.

  5. Imagine I have the system running. It’s backuped regulary and I schedule a prune in the Web UI to delete snapshots (revisions) older then 30 days and keep 1 snapshot every 7 days when older then 30 days. Where is the sense of the prune command it only deletes the revisions in the dropdown menu form restore? I mean what I really want is to clean up and save storage. Or do I need to schedule another prune -exhaustive and another prune -delete-only after?

  6. Is my workflow right?

Thanks for any advice.
Cheers Paul!

You definitely did not, start with reading prune command details thread on this forum, especially about two-step fossil collection mechanism. Some pointers:

  • Removing revisions doesn’t necessarily free up space as revisions do not necessarily take any. If you run 3 backups in a row in all likelihood all of them are the same, and removing even 2 out of 3 won’t release any space due to deduplication of chunks (they all share the same chunks)
  • You don’t need -exhaustive in this scenario, it is only needed if you have unreferenced chunks (failed backups or manual revision removal)
  • You don’t need -delete-only, delete will happen as a second step of a regular pruning process if you have new snapshots, see two-step fossil collection mechanism

As @sevimo correctly points out, the two-step fossil mechanism is key here, and required to protect against certain race conditions i.e. concurrent backups from multiple machines, etc…

Every time you run a normal prune, it does a ‘collect’ phase (delete revisions and rename chunks to fossils) and a delete phase (delete chunks that are concluded to be safe to delete).

This can only happen safely if, for all your backup revisions, Duplicacy completes a bunch of backups after a particular fossil collection has been made. The conclusion is that if these backups now reference any fossils in that collection, they can be resurrected, and everything else can be deleted.

Honestly, don’t dwell on the fact that chunks still exist - they’ll get deleted after your next backup and prune run. On a constant schedule, your backup storage should remain mostly flat.

Thanks! I understand it so much better now! I just didn’t found the article about the two step fossil mechanism
Thanks Paul

Here is IEEE paper that explains the whole thing: Duplicacy paper accepted by IEEE Transactions on Cloud Computing