How to split or separate backups?

I’ve been testing multiple backup workflows and solutions over the last week and have to say am really impressed with Duplicacy so far. However there is a specific workflow that I need to be able to acomplish in order to deploy this as a permanent solution.

I have a very specific use case where I need to perpetually back up multiple large (50x2-10TB and growning) shares from NAS1 to NAS2 using a PC running Duplicacy in between. The reson for it is becasue NAS1 has a propriatary file system and a locked down OS so there is not way of running Duplicacy on NAS1 or making NAS2 See it directly.

With that in mind, what I also need is to be able to do is either:
A.) keep all these shares in separate self contained backups
B.) have a way of extracting a particular share from the backup as well as shrinking the master backup after extraction.

The reason for it is becasue the retention policy is such that anything older than 1 year has to be sent off to AWS Glacier DA. I can perform that part of the process manually, however I need a way of being able to extract and remove a particular share from the backup when time comes.

What is the best way of going about this ?
With Duplicati it’s fairly easy as I can set different destination every time I set up a new job, but with Duplicacy destinations are set up separately with a unique ID, having to do 50 of them, then wipe 50 of them to add new ones every year seems like a very tedious solution…

Also, will I then be able to restore from cloud (recalling to S3 Standard storage first) on another comupter ?

Do shares shares contain data that may benefit from deduplication? If not – back them up to separate targets. problem solved.

Otherwise – backup each share into separate “snapshot id” into the same target. Then when you want to prune it completely from the backup store – delete the corresponding folder from “snapshots” and prune the backup.

Elaborate here. Why do you need 50 destinations?

Yes. It would be very expensive though. I you plan to ever need to restore – don’t use Glacier. B2/Wasabi will be significantly cheaper.

How often are you planing to do that? Annually? Daily? Hourly? Since you mention doing it manually (you don’t have to, it can be easily scripted) I’m assuming its more like annual spring cleaning type of deal

Elaborate here. Why do you need 50 destinations?

Becasue later, I need to move the latest backup of the oldest share offsite.

NAS1 is where we work from, each project we work on has a Share/Workspace/Mapped Drive assigned to it, the contents of each of those workspaces need to be backed up to NAS2 regularely.

When the project is wrapped, two things happen:

  1. the workspace for that project on NAS1 is deleted to liberate space.
  2. The up to date backup of the space is mirrored to AWS GDA

Finally, in a year’s time, th backup from NAS2 is deleted, leaving only one copy sitting on AWS.
We have our reasons for doing it this way.

I would love to be able to set up single backup destination for all workspaces to benefit from maximum deduplication, but then be able to extract a particular project from the backup without having to do a restore, followed by new backup of that project only, followed by prune of that project from master backup. - Because that would literally take a week for each project.

At the moment, I’m setting up a new destination folder on NAS2 for each Project on NAS1 which is a functional but suboptimal solution, especially when the speed of our project turnover goes up.

Ok, now it makes sense, thank you.

Yes, it’s all doable. Perhaps it would be easier to script all that with duplicacy CLI, but it’s also possible to do in web UI. Duplicacy terminology is confusing (snapshot vs repository vs destination) so I’ll speak in terms Web UI is using.

The way I see it you would need to create two storage destination on Storage tab: One representing storage on NAS2 and the other – on AWS (that would be transitionable to Glacier via bucket lifecycle rules)

On the Backup tab you would create backups per project, into the NAS2 storage.

Then setup scheduled backups in the Schedule tab as needed.

Once you are ready to archive specific project – you would do a copy task to AWS for a the corresponding snapshot ID(via copy task on Schedules tab; you can use it for manual tasks too), wipe that project from the NAS2 repository*, and then run prune --exhaustive task on the NAS2 storage (from the Schedule tab).

*This would be something that you will need to do manually: go to NAS2 storage and delete the snapshot corresponding to the project from snapshots folder. Subsequent prune --exhaustive will remove unreferenced chunks from the datastore to free some storage.

@gchen: It would be great if web-ui supported purging the specific snapshot ID from the storage, just as it supports pruning of specific revisions. until then – snapshot would need to be deleted manually

Thank you for explaining this. So to clarify, the steps would be as follows:

  1. Set up two destinations: NAS2 and AWS
  2. Set up all workspaces to back up to same repository on NAS2
  3. Use Copy command to extract specific snapshot and move it to AWS?
  4. Manually go into repository and delete corresponding snapshot file
  5. Run prune --exhaustive to remove orphaned blocks because there is no longer a snapshot file to point to them.

Is this correct?

By the same logic, could I use the copy command to extract a snapshot of a specific project to a location on NAS2 if I wish to use a different tool or machine to send it off to AWS ?

When you setup second destination don’t forget to specify it to be copy-compatible with the first.

Yes, with separate "snapshot id"s

It’s technically a folder (with files corresponding to revisions inside)

Correct.

Copy copies snapshots and revisions between copy-compatible duplicacy datastores. You can of course create a new empty store on NAS2 and copy any backup (snapshot in duplicacyspeak) to it, and then upload that duplicacy storage using any other tool to e.g. AWS. But then you won’t benefit from cross-project deduplication because now each project will end up archived in its own datastore, if I understood correctly what you meant.

Got one last question. If I only use two storage destinations, one of which points to aws S3 - that lifecycle transitions into GDA. How do I retrieve data from GDA without having to defrost the entire 100TB+ backup ?

Duplicacy was not designed to work with cold storage; however you can workaround this with a bit of scripting:

To restore files duplicacy first downloads the snapshot file. That file would need to be defrosted first but since snapshot files are small and in separate folder they can be left in hot storage rather easily.

The snapshot file refers to chunks that contain the full snapshot. Those chunks will need to be defrosted next (those too can be kept in hot storage with some more elaborate scripting to eliminate this extra defrost).

Once that is done – full list of chunks that are needed to reconstruct files for a specific revision can be extracted from the snapshot data and those will need to be defrosted. This last defrost operation is unavoidable. You can get list of chunks from a snapshot dump (see duplicacy cat -id <snapshot-id> -r <revision> and parse it into paths to files.

Once these are defrosted duplicacy restore will work. You could wrap that into pre-restore script for duplicacy.

This design is not unique to Duplicacy and is the reason why cold storage is not supported by most backup tools (there are some exceptions).

Or I could set up new storage location on aws for each backup? Since I’m doing send of f to S3 manually and selectively. It’s a bit more tedious but I could then delete the destination storage record from the GUI app and if I need to relink to it I can just unfreze that specific folder and re-point to it in GUI, right ?

also, if Im using GUI version on windows how do I run CLI version and get prune to work for s3 bucket ? Where do I need to navigate to in the console? Documentation’s not clear on this.

If I navigate to
C:\Users\USERNAME.duplicacy-web\repositories\localhost\9.duplicacy
or C:\Users\USERNAME.duplicacy-web\repositories\localhost\9.duplicacy\cache\AWS-bucketname

and run duplicacy_win_x64_2.7.2.exe with prune -exhaustive, I get “repository has not been initialised”

You can add -d options in the web GUI to prune specific snapshots.

The repository location for running non-backup jobs is C:\Users\USERNAME.duplicacy-web\repositories\localhost\all.

Thank you very much, all worked! Would be great to have an option to run prune and copy commands from GUI without having to use schedule becasue sometimes these are one-off operations.

Haven’t read all this thread, but… you can set up a schedule and untick all the days - nothing will run, unless you manually press the run button.