Struggling to Prune Properly

Hey all - looking forward to seeing if Duplicacy is the right fit for me. I’ve been using restic, but debating if Duplicacy may be superior for my use case. To start with, I’ve been testing it out on my NAS backup, but because of a slow home connection, I spun up a VPS, restored the TB’s of data to that, then used duplicacy to backup from the VPS -> B2, with the plan that due to deduplication, I’d be able to back up from home afterward without uploading many/any blocks.

Once that completed, I started backing up from my home NAS, and noticed that it was uploading almost all blocks/chunks brand new - unfortunate, but based on my reading, if one file is slightly moved, it can make all chunks reupload (I’m guessing because I backed up from /mnt/restore/ instead of just /). Because of this, I tried pruning out the old backups - I figured that just setting it to keep “0 backups when older than 1 day” would be sufficient, however, when I go to do fossil collection, I’m told “Fossils from collection 1 can’t be deleted because deletion criteria aren’t met”.

Further reading suggested I needed to do another backup from the original machine before I could prune this out - so given I’d deleted the VPS, I backup up a blank folder with a config file identifying it by the same name (i.e. duplicacy-vps). Then I was able to run the prune command, but not many chunks were remove. I looked a bit more into it, and found out your can remove snapshots/revisions by ID, so I told duplicacy to manually remove these - but again, I cannot remove the fossils. How am I to deal with the circumstances where a machine is no longer available, but I want to remove it entirely from the backup? Surely I am missing something :slight_smile:

Edit: Actually, can’t even do a single host prune properly it seems:

[root@mail ~]# ./duplicacy_linux_x64_2.2.1 prune -keep 0:1
Repository set to /
Storage set to b2://mail/
Keep no snapshots older than 1 days
Fossil collection 1 found
Fossils from collection 1 can't be deleted because deletion criteria aren't met
Fossil collection 2 found
Fossils from collection 2 can't be deleted because deletion criteria aren't met
Fossil collection 3 found
Fossils from collection 3 can't be deleted because deletion criteria aren't met
Fossil collection 4 found
Fossils from collection 4 can't be deleted because deletion criteria aren't met
Deleting snapshot mail at revision 1
Deleting snapshot mail at revision 6
Fossil collection 5 saved
The snapshot mail at revision 1 has been removed
The snapshot mail at revision 6 has been removed

[root@mail ~]# ./duplicacy_linux_x64_2.2.1 backup
Repository set to /
Storage set to b2://mail/
Last backup at revision 7 found
Indexing /
Parsing filter file /root/.duplicacy/filters
Loaded 6 include/exclude pattern(s)
Backup for / at revision 8 completed

[root@mail ~]# ./duplicacy_linux_x64_2.2.1 prune -delete-only
Repository set to /
Storage set to b2://mail/
Fossil collection 1 found
Fossils from collection 1 can't be deleted because deletion criteria aren't met
Fossil collection 2 found
Fossils from collection 2 can't be deleted because deletion criteria aren't met
Fossil collection 3 found
Fossils from collection 3 can't be deleted because deletion criteria aren't met
Fossil collection 4 found
Fossils from collection 4 can't be deleted because deletion criteria aren't met
Fossil collection 5 found
Fossils from collection 5 can't be deleted because deletion criteria aren't met

To delete the chunks left, you should run

duplicacy -d -log prune -threads 20 -exclusive -exhaustive

Reference: Prune command details .

Please be sure that during this particular prune no other backups or prunes run on the storage (because of -exclusive).

1 Like

Thank you @TheBestPessimist - much appreciated. I think I was doing -exhaustive without -exclusive in my testing. Just to clarify, if manually remove the snapshots by specifying ID’s (such as to remove all snapshots by a “no longer used” machine) and then run with -exclusive -exhaustive, I should expect this to remove all chunks that were only referenced by that machine, correct? I.e. in this way it’s completely possible to get rid of an old machine?

1 Like

I think it’s better to look at it from the opposite side: if any chunk is used by any revision (or multiple revisions) of any computer: it will not be deleted. the end.

So you can run -exclusive -exhaustive 100 times in a row, but only the first will delete anything (while the rest will just waste time checking that each chunk is referenced at least once).

The faster way to do it, if you wanna delete all the revisions of a computer is delete that computer’s snapshots folder on the storage. For example when my storage is google drive i will go to:


And just remove one of my repositories (in this case it is called “tbp-bulk”), and run the prune afterwards for chunk cleanup

1 Like

Right - I get that; I guess more to the point I’m trying to get at is rather than relying on pruning with a rule like -all -keep 0:1 after running fake backups on blank folders to “push out” the old ones from dead hosts, can I manually remove the last snapshot for any given host with the specified host and ID? I’ll be trying it shortly when the prune finishes anyway, but figured I may as well ask (and might help someone else looking for the same info). Thanks!

See again my edited response above. (you’re faster than my writing :stuck_out_tongue: )

1 Like

Ah no worries - figured there’d be a “proper” way to do it via CLI, but that method will suit me just fine too. Thanks :smiley:

Well… there could be one, but when i needed to cleanup something i found this to be the fastest ¯\_(ツ)_/¯.

Yeah makes sense - and works just fine for me. That was my main hurdle - being unable to fully remove old machines. Final testing underway now, but likely makes this a very workable solution for me :slight_smile: Thanks for your assistance!

Edit: Follow up for anyone else; you can also remove remaining snapshots manually with:
duplicacy prune -id <OTHER-MACHINE> -r <REVISION> -exclusive

This removed it such that I could not see any other hosts in my “snapshots” directory after this :slight_smile:

3 Likes

This is theoretically possible but very unlikely to happen in practice. When a new file is inserted the variable-sized chunking algorithm should find previous chunk boundaries after generating a few new chunks.

What is more likely to happen could be that Duplicacy reported all files being uploaded but only some chunks were actually uploaded (see Log says it uploaded files, but then says "0 bytes uploaded").

I guess I’m very unlucky then :smiley: 34% of the way through 1TB, and have uploaded 300GB so far (as confirmed by firewall statistics + ISP stats + change in bucket size on B2). Ah well!