Backup approach and fundamentals question - pruning

mark.whitworth · 12 December 2020 17:39

I’ve been searching, but I haven’t found an answer for this - at least not one I’m comfortable that I understand completely, and I don’t want to take risks on something this important. First, my basic setup is that I have 2 Unraid servers in separate zones on my home network, and I’m running Duplicacy on both. Key files on each are backed up to the other via Duplicacy.

One area has been a concern for me - my Media folder. It represents about 8TB of movies, music, and pictures. The initial backup took days and follow ons have been much faster, as expected. If I had a failure, however, it would take forever to restore so I’ve been thinking about using sync for that. I worry about ransomware or some unnoticed corruption getting synced, though, so I’ve favored Duplicacy.

Assuming my above thinking is rational, I’m leaning Duplicacy as I mentioned. But here’s the other big question for me: if I delete some of these movies because I don’t want them anymore, I encode a better version, etc., will prune or some other function of Duplicacy allow me to reduce the storage space used or will it keep getting larger and larger, regardless of how much I’ve deleted since the snapshots were taken?

Thanks for helping me understand.

Droolio · 13 December 2020 00:19

It wouldn’t reduce storage space until you’ve pruned every snapshot revision where those media files are referenced by. However, you could potentially mitigate that by have reduced retention times for these types of repositories, and longer retention for regular data folders.

For this reason, I too use sync (Rclone) for my media - TV and movies - although I still use Duplicacy for music, coz most of it is mp3 and occasionally reorganised (metadata etc.) - although I do have some FLAC but not a huge amount. If I had a lot of FLAC, I’d probably consider Rclone for that as well.

Photos, for me, kinda make sense to use Duplicacy, since I may work on reorganising the folders over time (from when they get auto sync’d from my phone with SyncThing - to the PC - then organised later).

I guess one way to think about it is, large media like movies tends not to change much, so a backup program may not make the best sense. It’s more like archival material, so once it’s there it rarely changes. Whereas backups are good for data that’s more fluid.

mark.whitworth · 13 December 2020 01:42

Thanks for the clarification, and for the suggestion - that seems like a great idea if I continue down this road vs Syncthing for this subset of my backup. Decisions, decisions…

cheetahsquirrelporcu · 10 July 2021 03:09

However, you could potentially mitigate that by have reduced retention times for these types of repositories, and longer retention for regular data folders.

Just a point of clarity, if you don’t mind: in this scenario, would you need to setup multiple, separate backup tasks, or can this be accomplished with just one backup task?

Droolio · 10 July 2021 16:48

If you want different retention times for different types of data, you’d have to create separate backup jobs in order to distinguish them by unique snapshot/backup 'ID’s. That way, you can apply a different prune job for each ID.

However, you can still run the backup jobs sequentially for each repository root (i.e. a schedule in the Web Edition can have multiple jobs per schedule), which is pretty normal practice.

Duplicacy was kinda designed anyway with the intent that you’d backup from a single ‘root’ directory, rather than selecting multiple locations to include in one big backup job. So to include multiple locations you’d have to run multiple jobs. That’s not really a problem with Duplicacy since it was also designed to allow multiple clients to backup to the same storage while also being able to de-duplicate data across locations. The overhead is extremely minimal and in fact there’s advantages to breaking up data into multiple repositories.