How to reduce storage size?

Hi all. I’ve been using Duplicacy to backup to Backblaze for… maybe a year now? A while anyway. Generally it works great and is very reliable. However, my Backblaze usage keeps growing and growing and growing… It’s up to 3.6 Tb now because I had never set up a prune job or limited the number of revisions(?). (I’m not entirely sure about the correct terminology here.)

My backup job log looks likes this:

Options: [-log backup -storage Backblaze-B2 -threads 4 -limit-rate 3000 -stats -stats]
2023-09-03 03:00:01.642 INFO REPOSITORY_SET Repository set to /backuproot
2023-09-03 03:00:01.642 INFO STORAGE_SET Storage set to b2://grigsby-citadel-duplicacy
2023-09-03 03:00:01.765 INFO BACKBLAZE_URL download URL is: https://f000.backblazeb2.com
2023-09-03 03:00:03.159 INFO BACKUP_START Last backup at revision 287 found
2023-09-03 03:00:04.262 INFO BACKUP_INDEXING Indexing /backuproot

So there are roughly 287 revisions, which is way more than I need. I’d like to reduce my Backblaze usage to about a month’s worth of backups.

A few weeks ago I added a prune job to run once a week with the following arguments:

-log prune -storage Backblaze-B2 -keep 0:30

The log says

2023-08-30 06:00:01.306 INFO STORAGE_SET Storage set to b2://grigsby-citadel-duplicacy
2023-08-30 06:00:01.449 INFO BACKBLAZE_URL download URL is: https://f000.backblazeb2.com
2023-08-30 06:00:02.948 INFO RETENTION_POLICY Keep no snapshots older than 30 days
2023-08-30 06:00:12.045 INFO SNAPSHOT_NONE No snapshot to delete

So my Backblaze usage hasn’t gone down, and duplicacy seems to think there’s nothing to delete. What am I doing wrong? How do I run a prune command to delete everything older than 30 days?

Thank you very much!

You are doing it right, it’s not clear why does prune think there are no eligible snapshots.

I would try adding -d global option to prune command to see more detailed log.

Can you connect to b2 with some other tool like Cyberduck and confirm that there are actually snapshots in the bucket older than 30 days? The prefix would be snapshots/snapshot-id/…

Another one — pass specific snapshot id to prune command, not just rely on the default, or pass -a flag, to prune all snapshot ids

Thank you for the suggestions! I appreciate the pointers on where and how to keep investigating. I’ll keep working on it…

Keep in mind Duplicacy won’t remove chunks until after the second step in the fossil collection process. So you’d have to run a whole round of backups and then another prune before those fossilised chunks actually get deleted.

Are your prune logs above from your first run? That’s the only explanation I can think of it - you should have seen snapshot revisions get deleted at least, and then chunks renamed to .fsl.

I did read that it won’t prune after the first run for the reasons you stated, but the prune job has run three or four times now. Thank you for the suggestion! I’ll keep looking into it.

You can run as many prunes as you like, but there must be a round of full backups between them. This ensures the marked fossils (collected) are definitely not used in subsequent backups…

So I can I force a full backup? I’ve been running this for ages and it seems prune has never pruned anything. The back done seems to be only incremental.

All backups are incremental, but if you use the -hash flag with the backup command, it’ll cause a cleaner break with historic chunks such that more chunks may be pruned in future. Be warned however, a rehash will take longer.

As far as ‘prune has never pruned anything’ - be sure to backup all repositories on a schedule, and run semi-regular prunes (with the -keep flags) - stuff will get removed eventually.

As far as I understand it, every time a “backup” runs a new revision is created for a snapshot. I only have 1 snapshot. “prune” works at the snapshot level, so if I only have 1 snapshot at any given time, then it’ll never delete anything, correct? so when does a new snapshot gets created? I’ve only ever had the one snapshot “snapshot 1”.

Also, what is the relation between a snapshot and a “Backup ID”? is a given “Backup ID” supposed to have multiple snapshots?

No, prune deletes revisions from the specified snapshots.

Backup ID is snapshot name. Snapshot can have multiple revisions of the state of repository. Repository is the source location you backup.

Duplicacy uses the “snapshot” in a bit unconventional way

Got it. That’s really confusing. I’m going to wait and see, I’ve started a backup from scratch a few days ago and I’m running a daily prune with “-keep 0:7”. So far I’ve been getting “INFO SNAPSHOT_NONE No snapshot to delete”.

If prune deletes revisions then why does the log say “no snapshot to delete”, shouldn’t it say “no revisions to delete” or “no snapshots to delete revisions from”?

Should I expect to see some “revisions” deleted from that single snapshot that I have in the next few days?

Indeed, it’s inconsistent.

Basically your source data is called “repository”, and it is identified with a “snapshot id” on the “storage”. Every backup of that repository creates a “revision” under its “snapshot id”. I strongly dislike this naming scheme myself. It’s very confusing.

Prune will always keep at least one revision. Running prune -keep 0:7 -a will delete all revisions older than 7 days, but will keep at least one.

If you run backup daily, after 7th backups has been done, on 8th day you can expect that prune to delete first revision.

I would not expect any meaningful space savings though, unless your data changes drastically between revisions. For most uses this is not the case.