Getting thousands of 'Chunk is already a fossil'

Not sure what is going on or why, prune is taking all night and next day if it’s even finishes at all…after Duplicacy being up and running normally for months, my prune log looks like the following and it just goes on for thousands of lines/pages with different chunks already being a fossil…what does that even mean and do I need to fix a problem? Thank You

2022-11-17 17:03:46.641 INFO SNAPSHOT_DELETE Deleting snapshot 1 at revision 65
2022-11-17 17:03:46.746 INFO SNAPSHOT_DELETE Deleting snapshot 1 at revision 66
2022-11-17 17:05:17.322 WARN CHUNK_FOSSILIZE Chunk 5a0267a4f6e25ed36b5ac0a97afef6322bdf80475e4828546a2c8fd6858f2ce3 is already a fossil
2022-11-17 17:05:18.234 WARN CHUNK_FOSSILIZE Chunk ebe50c2eb10c80f3028c41ca663aaf919c2615d0a1624ba2d3358833c57dbc46 is already a fossil
2022-11-17 17:05:18.955 WARN CHUNK_FOSSILIZE Chunk 7e267d3363ee7954ba5903e97502240b839b4f74919c4840719648c5a0b9125c is already a fossil

There is no problem to fix. Those are informational log messages. Prune can take very long (weeks) depending on number of snapshots, chunks, extra parameters passed, and endpoint latency.

Seeing how it processes about one file per second I assume you are using one of those *Drive services as backup target? Then that’s expected performance.

Edit. Also, if you interrupt the prune, it may go re-process same chunks again, but also may start failing on stale snaphosts, depending on where it was interrupted. The latter one is a known issue.

I am using Google Drive with a Google service account to gsuite account that has 30TB capacity. Yes

Perhaps it’s because I temporarily have a visual database copied to a part of the share that’s backed up I don’t normally have and it’s many hundreds of thousands of files of terrain height data?

Or is the number of files inconsequential since duplicacy is chunk-based?

I guess I’m struggling to understand the concept of operation here. If prune can take that long then how is Someone to have a backup solution that can’t manage revisions and snapshots because cleaning them up takes weeks at a time, which means you can’t back up that whole time it’s doing that I’m guessing I’m missing something here cuz I’m not seeing how that actually works in practice if it takes that long to prune

there is is


It does not. You can backup while prune is running, duplicacy is fully concurrent. You can even run prune from entirely different host, for example, cloud instance.

It takes long to prune because Drive type storages are not suitable as backup destinations; they work on small datasets; but with large amounts of data this does not scale. I’ll link you prior discussions on the merits of using google and other drives as backup destinations.

Yea. I have local backups covered but this is my solution for off-site. Good to know it’s concurrent. So I guess I’ll remove the prune from my nightly job and make it its own job, will 2 separate schedules run concurrently? What’s the best way to set this up? All in 1 schedule and just click concurrent?

Yes, you can make it two separate schedules, they should work concurrently.

If you do in one schedule then I don’t think backup will re-run until prune finishes, defeating the purpose.

When I was using google drive as a target I ran prune weekly, from another instance at oracle cloud (they offer pretty beefy free compute instance), mostly to avoid interruptions, and associated overhead, including the phantom check failures if the prune is interrupted after it deleted chunks but before it deleted the snapshot file that referred to those chunks. Then I stopped pruning altogether, because I wasn’t deleting much data so the overhead was pretty minimal.

Separate schedule and you don’t need to tick 'parallel.

Add -threads 16 or similar and your prune should complete earlier.