Multi-storage Pruning policy

372800b839af73b7fc0b · 21 September 2023 05:53

I’ve got a setup that uses multiple storage pools and different pruning policies among them. I’ll describe the storage and desired outcome first, then what I see in my logs.

Laptops and a server back up to a storage pool on a server at home. Laptops perform backup and a very basic check, the server performs pruning, copying, checking, and a weekly more in-depth check. The initial backups target “pool0” on the server. Once a day the server performs a copy and check from pool0 to pool1, then a copy and check from pool1 to a pool stored on backblaze for off-site storage.

The desired retention period on pool0 is -keep 0:366 -keep 7:90 -keep 1:15 -a . So, every backup until 15 days, then 1 backup per day over 15 days, then 1 backup per week over 90 days, and after 366 days revisions should be dropped.

On pool1, the copy target for pool0, I want retention to be -keep 0:2922 -keep 90:730 -keep 30:180 -keep 7:35 -keep 1:1. So all backups until a day, then 1 backup per day older than 1 day, 1 backup per week over 35 days, 1 backup per month-ish over 180 days, 1 backup per 90 days (quarter) over 730 days, and finally after 8 years revisions should be dropped.

Finally, on backblaze I want the prune policy to match pool1. These storage targets are meant to be longer term destinations, copied here at home and duplicated off site for disaster preparedness. The pool1 storage is meant to keep pool0 more free for user storage space, since pool0 is faster drives with mixed use as a NAS and a few other things.

From Back up to multiple storages I saw

If onsite pruning is more aggressive than offsite pruning, this would work (but is not a great idea).

I’m wondering if I can get clarification on what this means? Because I’m certainly getting messages in my prune logs that are confusing, but perhaps expected? For instance, after a copy from pool0 to pool1, the prune task runs and I get a bunch of messages like this:

2023-09-21 00:37:28.445 INFO FOSSIL_GHOSTSNAPSHOT Snapshot rhodes_home revision 5448 should have been deleted already
2023-09-21 00:37:28.445 INFO FOSSIL_GHOSTSNAPSHOT Snapshot rhodes_home revision 5983 should have been deleted already
2023-09-21 00:37:28.445 INFO FOSSIL_GHOSTSNAPSHOT Snapshot rhodes_home revision 5987 should have been deleted already
2023-09-21 00:37:28.445 INFO FOSSIL_GHOSTSNAPSHOT Snapshot rhodes_home revision 5991 should have been deleted already
2023-09-21 00:37:28.445 INFO FOSSIL_GHOSTSNAPSHOT Snapshot rhodes_home revision 5995 should have been deleted already
2023-09-21 00:37:28.445 INFO FOSSIL_GHOSTSNAPSHOT Snapshot rhodes_home revision 5999 should have been deleted already
2023-09-21 00:37:28.445 INFO FOSSIL_GHOSTSNAPSHOT Snapshot rhodes_home revision 6003 should have been deleted already

If I launch the restore selector from pool1, I don’t see these revision numbers because the prune has run and it presumably deletes them. My guess is that these are being copied from the pool0 source every time the copy runs, and the prune step in my schedule just immediately deletes the copied snapshots? My concern is what happens on the inverse end of the spectrum? The copy isn’t deleting anything, so as long as my prune rules are in effect, snapshots, chunks, and the revisions they contain should all remain on my pool1 storage right? I might get some odd messages in the logs, but nothing should be too crazy?

What happens when those snapshots are out on pool0 though? They’ll already exist in pool1, but will pool0 also continue to contain them if I run a check job that resurrects data? I assume pool0 has no knowledge of what’s on pool1 so it shouldn’t make any difference whatsoever? But what about chunks that change, how might this impact the pool1 data?

Finally, the copy from pool1 to backblaze is just a copy and the prune should be pretty straight forward. I’m assuming it shouldn’t complain about fossil ghost snapshots because it’ll just be following from the last copy out of pool1 where that error has presumably been cleared up already? Again, the purpose of this destination is to keep an off site copy of data in the event that the worst happens.

I’ve attached my schedule page to better show what the server’s activity looks like. As I said, the laptops simply back up to the same pool0 target storage, so there isn’t much to see there.

If someone could look over this and just verify it makes sense somewhat. I’m trying to keep generations of data with different retention policies. The policy and copying make sense in my head, I think, and I believe I understand why the app is complaining about snapshots that appear but the prune believes shouldn’t exist. But I want to make sure I’m not doing an extreme anti-pattern here and causing potential corruption. There’s a couple TB of data being backed up, so I want to make sure I’m handling it properly and checking it regularly.

Droolio · 25 September 2023 00:28

Noticed this post hasn’t seen a reply in a few days so I’ll jump in.

You’re right to be concerned about this set-up since mixing retention policies like this (when copying a chain of storages from one to another) is fraught with problems and logically won’t work very well. Search this forum for something like ‘prune sync’ for past discussions.

While you can use different retention options, ideally they should be broadly matched to not cause a yo-yo effect… where you prune storage A and B (deleting snapshot revisions on both sides), which causes non-overlapping holes in the middle, where a copy from A to B re-copies previously-pruned snapshots, and the whole shebang starts all over again. Mainly because the snapshots that are kept don’t sit at the same boundaries as each other, and a default copy copies all snapshots. That’s why it’s also recommended to run (the same or similar) prune policies on the same day. (Since the retention periods are calculated from the oldest snapshot onwards, and if they’re run on different days, different snapshots IDs will get deleted even when you have a matching policy!)

It’s possible to have a different policy so long as the destination is only a subset of the revisions that’ll get pruned on the source. Or simply, won’t create holes that a new copy won’t waste bandwidth re-copying. You won’t corrupt data but it’ll be a solid waste. So my advice would be to match the retentions as close as possible. However, you might be able to do something like this (on the same day):

pool0: -keep 0:366 -keep 30:180 -keep 7:90 -keep 1:15
pool1: -keep 0:2922 -keep 90:730 -keep 30:180 -keep 7:90 -keep 1:15

Honestly, you’d have to double-check this coz it does my head in too, but the snapshot holes and keeping them in sync is what you have to be mindful of.

Notice above I tried to match the periods which are roughly in common. Since >730 doesn’t apply to pool0, that can(?) be ignored. 1:1 normally doesn’t do very much, but if you’re backing up more than once a day, pruning pool1 would remove snapshots that get re-copied from the (snapshot-abundant) 1:15 rule on pool0 - even if you only do a copy once a week. If you’re only doing one backup a day, you don’t need a 1:1 rule anyway.

An alternative solution - that would allow different retention policies on both storages - is, to copy only the most recent snapshot(s). Since it’s not copying all snapshots each time, and only the newest ones, you can be sure the prunes won’t deal with re-copying older snapshots.

Unfortunately, Duplicacy doesn’t have a native way to reference just the -last revisions, so you can’t make this work in the Web GUI. However, I use this method with copying Vertical Backup snapshots in the CLI version on Linux using a script like this:

#!/bin/bash

PROC_NAME='DUPLICACY_COPY'

pgrep -f "$PROC_NAME" && exit 1

DUPLICACY_PATH='/home/user/duplicacy'

cd /home/user/dummy

cmd_copy() {
  REPO_ID="$1"
  REVISION="$($DUPLICACY_PATH/duplicacy list -id ${REPO_ID} | tail -1 | awk '{print $4}')"
  cmd="$DUPLICACY_PATH/duplicacy -v -log copy -from default -to offsite -id ${REPO_ID} -r ${REVISION} -threads 2 -comment $PROC_NAME"
  $cmd > $DUPLICACY_PATH/copy-$REPO_ID.log
}

cmd_copy "repo1"
cmd_copy "repo2"

IMO, until Duplicacy supports a -last option, I’d try keep the retentions the same. I’m pretty certain you’ll always be fine if the -keep 0:m number is the same or bigger on the destination as the source.