Unreferenced chunks growing over time

I am having a bit of trouble finding the root cause of this issue and how to better configure my setup to avoid this. I first noticed this because my offsite storage was growing much faster than local storage when I expected it to be the same. I ran a prune -exhaustive -exclusive -dry-run -all on the b2 storage and there were lots of unreferenced chunks. Running without dry-run cleaned up the b2 storage and the local and nas storage were back inline size-wise.

I’ll start with an overview of my setup. I have two storages: one local on my nas and offsite in b2. I have multiple machines backing up to the nas storage, each machine has its own snapshot id. I run duplicacy-web in Docker on my nas. Other machines in the house run duplicacy cli via cron. There are backups that run on the nas to the nas storage via duplicacy-web schedules. My desire is to have the local storage copied offsite and no clients ever back up to the b2 storage; it is copy only.

My schedules in duplicacy-web:

  • A check job runs on both storages, in parallel, daily starting at midnight. The intent is to keep the graphs in duplicacy-web up todate.
  • The nas’s own backup jobs run, in parallel, hourly starting at midnight. There are two snapshot IDs from this based on logical grouping of files on the nas.
  • Pruning happens daily at 02:30 am. The prune jobs have identical arguments (-keep 0:1800 -keep 7:30 -keep 1:7 -a -threads 10). They run in parallel. The intent is to minimize the amount of chunks we copy to b2 and also to keep b2 storage closely mirrored to the local storage.
  • Offsite copy runs every 6 hours, starting at 04:00. The intent is to simply copy the local storage to b2 for offsite backup purposes.

I’m having a hard time figuring out where I go wrong in my setup that causes this. The prune jobs finish in time for the next copy to start. Backups do often run during the prune, but the prune is non-exclusive, so I believe that shouldn’t matter (?). Any help in improving this or diagnosing where I’ve introduced the problem would be helpful.

Do you have a snapshot ID on B2 that does not get updated? According to Policy 3 in duplicacy paper, fossil deletion requires every snapshot ID to have at least one snapshot that occurred after fossil collection, for fossils to be deleted

So if there is one or more inactive snapshot IDs that never advance (never copied to) – fossils never get deleted. The -exclusive flag disables two-step fossil collection and all safety and deletes immediately anyway.

This doesn’t apply after 7 days btw.

@tedski Only two things I can think of to check…

Are fossils actually being deleted on B2? Permission access issue?

Are the storages properly synchronised? (Not sure what impact this might have on unreferenced chunks growing in size, but I’d check that anyway.)

Yes, but this has been happening for 2.5 months, so I believe the 7-day rule means this is not the root cause.

Yes, I performed some log analysis and it shows that snapshots are deleted on prune runs. It also revealed that Ghost Snapshots are the ever growing problem (which I believe lead to unreferenced chunks?). Interestingly, a new ghost snapshot only appears on the 7-day threshold, which is also a retention for my prune. I don’t know if this is coincidence or not.

Here’s a summary of my log analysis:

Storage Behavior:

  • Local storage: 221 prune runs, 0 ghost snapshots, 0 fossils ignored - completely clean
  • B2 storage: 237 prune runs, 311 ghost snapshots across 88 runs (37.1%), 88 fossil collections ignored

Ghost Snapshot Patterns:

  • 11 unique ghost snapshots total (all from one snapshot ID)
  • Ghost snapshots are accumulating over time (started with 2 in Nov 2025, grew to 9 by Jan 2026)
  • Ghost snapshot revisions spaced roughly 168 revisions apart (~7 days of hourly backups)
  • All ghosts are at the 7-day retention boundary (where policy changes from hourly to daily)

What do you mean a ghost snapshot? An old snapshot that was pruned somehow got resurrected? Or wasn’t deleted?

Originally you said unreferenced chunks were growing, but this is a different problem to snapshots staying around. It sounds like you have some kind of object lock going on? Which shouldn’t ordinarily be enabled, as it’s not compatible with Duplicacy.

Ghost snapshots in the logs:

Running prune command from /cache/localhost/all
Options: [-log prune -storage b2.tedski -keep 0:1800 -keep 7:30 -keep 1:7 -a]
2025-12-31 02:30:01.458 INFO STORAGE_SET Storage set to b2://tedski\
2025-12-31 02:30:01.689 INFO BACKBLAZE_URL Download URL is: https://f002.backblazeb2.com
2025-12-31 02:30:02.242 INFO RETENTION_POLICY Keep no snapshots older than 1800 days
2025-12-31 02:30:02.242 INFO RETENTION_POLICY Keep 1 snapshot every 7 day(s) if older than 30 day(s)
2025-12-31 02:30:02.242 INFO RETENTION_POLICY Keep 1 snapshot every 1 day(s) if older than 7 day(s)
2025-12-31 02:30:50.814 INFO FOSSIL_GHOSTSNAPSHOT Snapshot docker-volumes revision 130 should have been deleted
already
2025-12-31 02:30:50.814 INFO FOSSIL_GHOSTSNAPSHOT Snapshot docker-volumes revision 295 should have been deleted
already
2025-12-31 02:30:50.814 INFO FOSSIL_GHOSTSNAPSHOT Snapshot docker-volumes revision 463 should have been deleted
already
2025-12-31 02:30:50.814 INFO FOSSIL_GHOSTSNAPSHOT Snapshot docker-volumes revision 631 should have been deleted
already
2025-12-31 02:30:50.814 INFO FOSSIL_GHOSTSNAPSHOT Snapshot docker-volumes revision 799 should have been deleted
already
2025-12-31 02:30:50.815 INFO FOSSIL_IGNORE The fossil collection file fossils/5 has been ignored due to ghost sn
apshots
2025-12-31 02:30:50.816 INFO SNAPSHOT_DELETE Deleting snapshot electron at revision 94765

The reason I said unreferenced chunks is because when I ran a prune dry-run there were lots of unreferenced chunks in the output. Also, Object Lock is not enabled on this bucket.

First time I’ve seen this. See here for another example.

But, coincidentally, this might still be related to synchronisation? (Same log entries.)

Are you sure the same revisions aren’t being re-copied? I’d run with the global -v or even -d option, to make sure you’re seeing everything.

Help me understand the synchronization issue you’re referring to so I know what to look for. As I understand it, the symptoms of poor synchronization would be that chunks would be copied to the remote storage that shouldn’t necessarily be there. If that were the case, since I have the same prune policy on both storages and they start at the exact same time and run in parallel, that we’d be unnecessarily copying chunks that then get pruned out. That would not lead to ever-growing storage on the remote end. In this case, the remote storage was double the size of local storage in about 2 months, when they should be quite similar in size.

Less about the chunks and more about the snapshot revisions (and a subsequent re-copying of the chunks that may have been left in a fossilised state; since they’ve been collected but not deleted).

TBH I don’t know the specifics about B2, but fossils are normally renamed (to .fsl) or moved to a separate directory (/fossils) - depending on the storage backend. But you could potentially have many copies of the same chunk/fossil if they’re being re-copied and never actually deleted after collection.

Just to be clear on this point, as far as the copy operation goes, they should be copied. It’s when they’re next pruned do they get immediately fossilised again and the snapshot revision is deleted.

Anyway, I could be wrong on this, but it’s important to understand that even if your two prune schedules are the same and run on the same day, if they were ever run differently and the storages become out of sync, they won’t necessarily automatically re-sync if you stick to the schedules. The holes have to be fixed (copied both ways) by making them the same again.

But first you should check if the same revisions are being copied again and again.

Maybe @gchen can chime in and provide some incite?

I ran some log analysis and it seems you’re right on the synchronization symptom. The same revisions in the ghost snapshots are the ones that are being recopied. So, it seems what was happening was local copies to b2. Prune runs but has two different references for what to prune and one of the problematic revisions gets fossilized on b2. Then copy recopies the chunks over. Now the same chunks live in the chunks directory and in the fossils directory. Then, when prune runs again, this is detected and fossil collection is halted due to the presence of ghost snapshots. The fossils just pile up over time and that causes the runaway storage usage.

Hopefully @gchen can confirm this. In the meantime, I’ll do the bidirectional copy to resync the storages. Then, I’ll run a prune on them both with the same options and at the same time.

Just for your future knowledge, this is how it works on b2 also.