The first line in prune details sais the following:
In terminology we are using both revisions and snapshots to refer to the same thing: all the files created when you run the backup command (i will refer to this as a revision
from here on). This naming has always been a constant source of confusion (even for us, staff), i know .
What the -id
parameter does is set a specific snapshot (a snapshot is the whole set of backups (revisions) of a particular repository
(repository = folder on your local computer) ).
So let’s say you have 2 folders that you wanna backup C:/work
and D:/misc
. You will init a repository in both of these folders, and when you init you are asked to also provide a unique “snapshot” name. This is the name by which knows which folder you wanna back up, it doesn’t care about anything else except for this “snapshot” (also found as “snapshot id”, hence the -id
option for every command).
What prune does firstly is check which of the existing revisions (for each individual snapshot) is ready to be deleted (pruned). If any revisions are found, they are only marked for deletion. The next time prune runs it will find these marked for deletion and really delete them.
then continues to look again for other revisions which can be deleted, and only marks them – step 1 all over. This is why pruning is done with a Two-step fossil collection algorithm.
If you only have a single repository, then allows you to not give a snapshot id, and it provides one for you, called default
(that comes as a shock right? ). This is what uses when it refers to “default snapshot” or “default snapshot id”.
Since you read the whole prune #how-to, you should have also read the section: Only one repository should run prune
.
The default you can use, depending on how much storage you want to use for old revisions is
duplicacy prune -all -threads 30 -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7
and you only run prune from a single repository.
I think pruning once a week, or once every 2 weeks is enough, but again: this depends on how often you do new backups (new revisions). Pruning time is very dependent on 1. the number of revisions existing in *all the snapshots and 2. the number of files which exist in those backups.
Again: this depends on the size (files and GB) of you backups, so this may vary greatly. imo though you should never ever need to download more than 5% of the total size of the backup (plus this data is cached afterwards), and in my case i don’t think has ever downloaded 1GB of data for my biggest repository (1.3TB).
doesn’t download the data chunks, it only downloads the meta-data chunks (those which store information about the backup) so it knows which, what, how and who to prune.