Prune Advice Web GUI

papercars · 11 February 2022 22:23

Hi All,

Very new user and a bit confused here.

I’ve read through a load of prune posts as well as the guides and I just can’t get my head around it.

My current setup:

I have a backup location which has data in it that is changing very regularly. It usually sits below the 2TB mark but data is regularly changing. This location is backing up to backblaze B2 using the web GUI version of Duplicacy.

I can be adding 200GB a day or deleting 200GB a day from that location. It’s basically housing my off-site backups for ongoing projects and when the projects are completed the data no longer needs to be backed up off-site.

The backup runs twice a day, once over night and then once, rate limited, through the day. I decided to set it up this way as the initial upload was 1.6TB and I have a bad internet connection. I needed it running as regularly as possible just to get the data up to backblaze!

I then have a check command that runs midday and that’s what I’ve been doing for a couple of weeks now, as you can imagine with that backup cycle there’s a lot of revisions to check now.

I’m now coming to a stage where I can get rid of the some of the data/projects.
What I’m aiming the behaviour to be is that I delete the data from the backup target and 2 days later that data is deleted from Backblaze.
I thought I could achieve this by running the backups as they are and a prune task that runs every day that deletes all previous revisions older than two days.

I just can’t work out how I’d achieve this and maybe I’m just being stupid but i can’t seem to get my head around it.

Just wondering if anyone had any ideas?

I’m well aware that I might have set this up incorrectly and I’m also open to any advice on best practices that I can use to streamline the process.

Thanks for your help in advance!

saspus · 12 February 2022 03:30

prune -keep 0:2 shall do it.

towerbr · 12 February 2022 13:00

I didn’t understand one aspect of your use case (if I understood it correctly): you have project files that are so important that you run two daily backups while these projects files are in use, but once finished you can discard your first backup location and offsite backup (Backblaze) of finished projects after two days. That’s right?

papercars · 12 February 2022 13:31

I know it sounds weird but that is the use case.

Basically they’re backups of the ongoing projects and once the customer has signed off and paid for the project I then have no need to hold onto that data.

It’s written into contracts that once sign off is done I will no longer store any data from the project and it’s on the customer if they want to pay for me to store it longer if that makes sense.

That way I won’t accumulate a lot of redundant data and, if I have hardware failure on my file server or PC whilst working on that project, I have an off-site backup to restore any lost data for the current active projects.

I’m aiming for lost time rather than lost money in an emergency situation. This is because the nature of the data I’m storing/backing up means that it’s captured once and cannot be captured the same again after its initially created.

TLDR is - I need reliable off-site backups for the duration of the project and then I need to be able to completely remove it after the project is signed off.

towerbr · 12 February 2022 15:39

Ok, I got it. If your projects are less than 30 days long, it might make sense to consider a “drive” sync solution like Google Drive. It depends on the average size of your files. This synchronization is not practical with large files.

papercars · 12 February 2022 16:07

Thanks so much for your response @towerbr. I had actually been considering Google workspace as a solution and I believe their business standard tier gives access to 5TB of Google drive storage which would be enough for off-site backups.

I would say each project ranges from 200/300GB and my upload speed is fairly poor at around 12Mbps.

The reason I went with Duplicacy in the first place is because of my storage setup:

I have TBs of storage on an Unraid server that I use to store long term data that I want to keep hold of but isn’t irreplaceable if I ever did lose it. I have a share on that server which is solely for off-site backups, I then have Duplicacy running in a docker container that then backs up using that share as it’s target and then uploads everything within that share to backblaze.

What I’m aiming to achieve is simplicity in that when I’ve got a project and I need to backup the source material I can just copy it all into that share, Duplicacy scheduled backup jobs will pick up the changes on one of the backup cycles and upload the latest project and then when I delete it it’ll wait one day before deleting it off Backblaze so if I’ve deleted something in error I’ve got a buffer period to address that and restore it if needed.

The nature of the work is producing video content and the volume of work is a lot - projects get signed off within a week of starting them often. That’s why I’d gone down that route and decided that a regular prune on top of that backup cycle would give me the flexibility I need.

The problem I had with Google drive was that I could do the upload on Google drive straight from my PC but might then be leaving it on for days to finish and that seemed like a waste of energy to me when I had the Unraid server already on 24/7 with ability to just run the backups in the background without me noticing. Also my experience with Google Drive is limited but if it’s anything like OneDrive the WebUI is terrible for uploads and I’ll be battling with local storage to store the data before the upload completes so I’ll effectively have to keep 2TBs free always on my local PC.

Hope I have been clear there, it’s not the most straightforward thing to explain but hopefully that’s little clearer.

papercars · 12 February 2022 16:07

Thanks for getting back to me

How would I add that into the GUI version as a scheduled job?

There’s a load of options before I can add in additional arguments such as - keep * for every 7 days etc. Do I just set them to 0?

saspus · 12 February 2022 18:35

Add prune job with the desired schedule and ignore retention settings configurable in the dialog.

Once you have the prune job added it’s arguments will be displayed in the column in the list. Click ok them to edit. Replace delete all -keep arguments and replace them with your own -keep 0:2.

Keep the -a argument intact.

Or if you only want to prune only a specific backup ID — you can replace -a with -id <backup id>

Off topic: this is one of the examples why I personally prefer running CLI version with OS scheduler instead of using UI: unless you have a very simplistic usecase, UI may be counterproductive, as you have to workaround the simplifications that ui brings. But that’s just me.

papercars · 12 February 2022 19:20

@saspus you are an actual life saver. That’s so simple and I feel really stupid for missing it!

I will give this a go and hopefully this remedies my issue.

Completley in agreement with CLI version probably being more robust and easily configurable. I’m comfortable working within and learning commandline but have set it up so that someone else in can easily manage it in my absence so the GUI was my only viable option at the time.

Thanks so much for your help