Is it possible to schedule tasks in duplicacy_web with a period > daily? Will manually adjusting the “frequency” value in duplicacy.json work?
Man. My brain must be tired. Weekly isn’t an issue; just use “daily” frequency on a single day. Sigh.
However, is there a way to do monthly tasks?
I don’t think monthly is possible. What is the use-case? What task are you planning to run that cannot be run e.g., daily?
I was thinking about doing purging monthly. But, I wound up just doing it weekly. I tend to overthink things.
Here for the same ask. Would like to run a check -a -file
only once per month since the whole check takes longer than a day in my setup. Maybe even once a quarter would be sufficient. It would be great to get it done in GUI and not cron or CLI, so that it appears in the activities log on the dashboard. I tried to call the GUI via API but struggle to get something working. What petition do we need to sign to get this feature?
But why?
If you don’t trust storage integrity — don’t use that storage. You don’t trust transport — same, don’t use that storage (or use check -chunks after each backup to verify newly uploaded chunks). If you don’t trust prune — run check without parameters after each backup.
Which concern are you trying to solve by downloading entire storage every month?
well… following your logic then you could ask: why is this feature then even existing in the first place?
but seriously: maybe I misunderstood the type of check it does. are you saying the regular check -a without -files is enough to verify integrity of data? even for critical data?
and then again, why not? I dont think it
s a big effort to implement more scheduling options. letting users type in a cron would be the easiest probably…
Good question. This is perhaps needed for catastrophic case when you suspect data loss to salvage what’s salvageable, but periodically downloading entire storage is waste of resources: if storage does not provide integrity guarantees suddenly learning that is lost data hardly helps — data is already gone.
This will download metadata chunks and check that all referenced data chunks are accessible. This results in minimum amount of downloaded data and helps protects against duplicacy tracking bugs. Chunks that exist are to be assumed to be valid.
You really have to draw a line somewhere: if you don’t trust storage, network, ram, CPU, …, your eyes, … where are you going to stop?
Duplicacy does not distinguish tiers of data. If data is corrupted it will fail to decrypt. You won’t get corrupted data from Duplicacy — either intact or none. This is a prudent design decision. You’d expect the same approach from storage providers — either my correct data back or none. From that perspective I would avoid B2 — they have proven in the past that they don’t have mechanisms in place to guarantee that. The solution here is not to write more code and workarounds to fix b2 — the solution is to move to a provider that knows what they are doing and guarantee data integrity by design, not on a “best effort” basis.
And if storage integrity is guaranteed — no need to verify it at any frequency. Running check without parameters is a way to enforce duplicacy’s guarantee on snapshot consistency.
Few reason
- It may prompt bad user behavior — look, “I can run this semi-annually, must be a good idea!”
- Egress from storage providers optimized for backup is extremely expensive, you would not want to nudge users to waste money.
- there are no operations that you would want to run monthly in this space really — it’s either at backup cadence, or never (aka when dust hits the fan).
More options is almost always bad news - it means developer gave up and shifted the burden of making decisions onto the users.
If anything, I would remove most options from web ui — every backup would run prune (if retention defined) following by check (with no parameters) immediately. No other options. It would have been easy, simple, unambiguous, and correct. (I can hear a few select forum goers here saying “glad you are not duplicacy developer!”, but I stand by this my conviction, less is more)
Users already can… (web ui is just a scheduler that only stands in the way in my opinion); and yet, they should not — because that would be solving wrong problem in a counterproductive way.
Does it make sense?
I see your points and partly agree, but still am not sure if I would give users the decision how they deal with their data and bandwith. I think a fair point is cost for cloud egress. Then again, you should consider explaining the check -files feature in UI and put some sort of disclaimer in there already now, making it transparent that literally everything will be downloaded. (yes I know, you can find that info already in the docs somewhere)
I happen to agree here, and think it’d be nice for Duplicacy to adopt the cron format - and not just for scheduling in the GUI, but for an alternative prune
syntax too.
Been thinking about this very thing lately - you could in theory set ‘expiry’ periods with multiple crons, and get a more predictable outcome.
Actually, it’s worse than that. If you just use check -files
without specifying revisions(s), it’ll process all revisions one-by-one and download every file and every chunk multiple times over! No kiddin’.
This is why we need a -latest
flag (which would be very useful for the copy
command as well), because otherwise you’d have to run it manually in the GUI after figuring out the latest revision for each -id
, not to mention then run it for each -id
(not sure if -all
overrides anything in terms of what revisions to check).
That said, check -chunks
is probably the better option anyway, in lieu of doing actual test restores (which - despite Mr sas-‘I don’t believe in testing backups’-pus inadvisedly recommends - I highly encourage you to do once in a while, or at least once).
This method also remembers what chunks are verified, so unlike -files
it only ever downloads once.
However, I personally DON’T run a scheduled check -chunks
and do it manually either once a year or so, OR rotate backups such that chunks regenerate/re-compress from a destination storage and chunks are pre-seeded locally from scratch; basically it’s all getting tested once a year one way or another. This I do along with -hash
on each ID. (This is kinda the reason I wouldn’t do this automatically - it requires extra resources, so a little manually planning is better IMO.)
TL;DR monthly content checks are probably overkill but do do regular check
s and always test backups full stop.
How is this user friendly? I work in tech for 25 years and continually despise cron format with passion. It’s classic example of users conforming to machines. This is never a solution.
For the gui there is nothing better than plain English — the way it is done today. On purpose, reflecting the same way people schedule recurring appointments in the calendars. Most people already familiar with the concept. Less friction, wider adoption.
For prune schedules in the ui we only need one checkbox: prune on or off. Prune on will prune according to common schedule (keep every backup for two weeks. Then bi-weekly backup for two months, then bimonthly backup forever). Or create two schedules — more history vs less space. I would even add an option to cap the storage (or bandwidth) usage.
Those are polishing touches that need to be implemented in the GUI. Adding cron schedules is the step in the opposite direction.
I do agree with the rest of the post.
I agree. The UI could benefit from massive amount of polish. Right now it’s pretty much a scheduler with dubious value — most OSes already provide a usable scheduler. Folks choose duplicacy for its backup engine but UI does not help adoption.
When attempting to use the webui I find myself fighting against it — I know what I want to do, but now I have to figure out how to tell UI to do that.
To be absolutely clear on this - I’m not related to duplicacy/acrosync/or contribute to the code; I’m just a user like yourself. In case “you” did not mean “Duplicacy developer”.
I’m NOT proposing Duplicacy should use raw cron syntax.
The GUI can handle presenting the actual schedule in a human readable way. However, under the hood, you can represent a very descriptive expiry schedule with crons. It’s significantly more flexible, and maps to more easily understandable, predictable outcomes.
With prune
, if you don’t run it regularly enough, you can have say intervals of 8 or 9 days with a supposedly 7 day -keep
definition. Weeklies can never start on a predictable day of the week (some people may have very good use-cases for that), nor can you have monthlies on the 1st/28th of each month - it’s just higgledy-piggledy. You also can’t purge on a sub-daily basis.
Most importantly, the prune has a problem because of this unpredictability; two storages can de-sync. Even if you wanted to, having two storages with different retentions, isn’t practical. Cron would fix that. It’s standard, portable, and easily convertible.
I’d also suggest you should be able to set
these expiry periods as part of the storage so that a simple expire
command can prune all IDs based on their own retention settings - in a single command.
For prune schedules in the ui we only need one checkbox: prune on or off. Prune on will prune according to common schedule (keep every backup for two weeks.
This is where I say ‘glad you are not duplicacy developer’
I have multiple storages with their own prune schedules. We need the ability to specify different schedules per storage (and ideally ID), not just be forced to a common one.
I would even add an option to cap the storage (or bandwidth) usage.
This is a good idea.
I even found a good use-case for (optionally) skipping snapshot creation if no files were modified.
This has been requested before, but I recently had to create a very frequent backup schedule for a repo that doesn’t change often, but is imperative to get an off-site backup as quickly as possible in case it does. You’d have to prune this pretty aggressively to keep the revision count down, but lose the ability to keep a sensible history. Plus with such an option, restores are a breeze since you don’t have to sift through check
logs.