Corporate Backup for 700 PC's?

Shaun · 22 January 2023 10:24

I’ve been a personal user for a bit over a year now, and I’m very impressed.

At work we’re starting to talk about potentially backing up a couple of key folders from every one of our user’s laptops/desktops (Our servers, etc are already covered in other ways) in the event that a ransomware attack managed to get through all of our other layers of security with a 0-day. There are obviously other scenarios such a backup would be helpful for, but ransomware recovery is the one that might get the budget approved…

I was wondering if anyone has any experience using Duplicacy on 600-700 devices in a corporate setting? After reading the forum’s for the last few hours, it seems at least theoretically possible, but I couldn’t find any posts of anyone doing it, or even talking about it.

In my head, I think I’d use the CLI version and handle everything in the background so my users didn’t even know it was running. I’d use B2 storage (or maybe Azure, if I could get the permissions to work) with an application key that’s configured for just writing files, no read, so that no one who found the application key on the machine (my employees included) would be able to see the rest of the contents of my corporate backup. (I think from what I read this can be done.). Any command that requires read/delete access would be done from a server IS controls that I trust. If I could make this work in Azure, I’d do it from a VM in Azure to keep egress costs down.

Ideally I’d set this all up to be automatically set up based on some workflows in our ITSM tool (ServiceNow), and maybe we’d even build a workflow in ServiceNow that would allow someone to find and restore their own files in the event they lost their machine or lost a file they were working on. This might require wrapping a REST service around the CLI, but that shouldn’t be too hard.

I’d guess that on average I’d be looking @ about 0.2 TB of data to backup per machine in the beginning that might, with the pruning I have in mind take up about 0.5 TB of data for each user, or 350TB total. With that in mind, I’d love it if someone could tell me that they have storage set up somewhere that’s 2X or 3X that so I’d have significant overhead for growth.

If a 700TB bucket in B2 would be insane for Duplicacy, I could chunk the machines down into groups of 10 or 25 or 50, etc but I’d lose some of the duplication checking that I’d imagine would be fairly advantageous over the whole company.

All of these numbers are subject to change as soon as I really start planning this, unless this absolutely won’t work of course.

sevimo · 22 January 2023 18:05

I am pretty certain that you’re in uncharted territory here. If I’d be in your shoes, I’d ask for a budget to run a pilot project, and you better have plan B in case it doesn’t work out for variety of reasons. Here are some of the things to consider:

700TB single storage would be almost 2 orders of magnitude the largest storages I’ve heard about; I’ve heard of (and in fact run one) storages of 20TB or so, but closing in on 1PB is an entirely new level
Memory consumption - this depends on many factors, but with large storages it is not insignificant, with your requirements it might balloon out of realm of reasonable (or it might not, see uncharted territory)
Speed of running check and prune - these are already quite slow on large storages (especially prune), so with your requirements it might balloon out of realm of reasonable (or it might not, see uncharted territory)
Storage cost considerations - I assume you already ran ballpark numbers, but keep in mind that it is not just storage, but ingress and egress as well, and in your case ingress might be non-trivial on a daily basis (depends how much your data changes and cadence of backups)
Last but definitely not least - support. Enterprise software is different from all other not necessarily in quality, but in ability to get someone to resolve your issues quickly. Your use case is definitely falls into enterprise level category, so you may want to figure out how can you solve any problems that may occur in your deployment quickly. Duplicacy is one developer, so you either have some kind of support contract (there is still key man risk there), and/or have some internal or external capability of resolving occurring problems.

Having said that, as an observer I’d be very interested to find out how (and if) can scale up

Shaun · 23 January 2023 00:11

Yea, I kinda figured since there were no posts about large install bases I might be in uncharted territory, but I figured I’d ask as I like the concept of a “simple” headless backup that we could just kind of set & forget (with the appropriate automation to check logs for status updates per machine). It’s also possible that I would just this on half of the machines where my backup policy requires it for policy adherence.

I’d ask for a budget to run a pilot project

Yep. As part of a pilot I’d see what 30-50 TB looks like and see how long a check/prune takes.

closing in on 1PB is an entirely new level

This might be manageable with automations that could keep things down to ~10-15 TB per B2 bucket, though as mentioned I wouldn’t get the same level of chunk duplication. I’m not sure how big a deal that would be though.

Memory consumption

This would be an issue on the backup sources as well as the server doing the check/pruning, right?

Speed of running check and prune

Yea, if this chews up a beefy cloud VM 24/7 to run all the pruning that would certainly eat into my cost efficiency.

Storage cost considerations

Ideally we’d do storage in our existing Azure subscription where we’ve got some volume discounts already. Running the check/prune from a VM in that infrastructure would minimize those costs (though I’d have our Azure guy run real numbers before we get started of course. I’m not as worried about the egress costs, as if a user is doing things the right way, most of their documents should be in SharePoint/OneDrive, which has it’s own versioning. This would be the absolute last resort option, and likely it would be during a ransomware event where no one would be asking “how much” the question would just be “when can we have our files back”.

Support

Yea. We find that we almost never contact support for most of our products. I have a fairly large IT team, and I think that as long as we properly test things, and only do the things we test we’ll be ok without much support (famous last words, but the pilot should prove that out).

Also, with it not being a SaaS application, and with the updates being infrequent, unless there was a security problem, I think we just wouldn’t upgrade unless we found a bug that was fixed or needed a new feature that was in a newer version. Theoretically, should development stop completely and no one ever heard from the developer ever again, I could continue with the status quo until we had a chance to get something else up and running. We could still do both backups and restores for as long as the CLI was operational. I believe that in a “worst case” scenario where the storage provider changed their API and we could no longer to a backup, we could (I think?) download the chunks to an on-prem file share (or find some way of mapping a file share to the storage) and do a restore that way.

It might be slow and expensive, but I think it would work in a pinch. I’m big on figuring out the worst case scenario when selecting new vendors, and on-prem tools almost always have a better opportunity for surviving a vendor’s bankruptcy/abandonware than any SaaS product. The only problem I could see with Duplicacy would be the licensing check failing. I think that could be figured out with a contractual agreement though.

I’d be very interested to find out how (and if) can scale up

If I do a pilot after working through the thought experiment and getting better estimates of the costs (and comparisons to other SaaS offerings), I’m sure I’ll be able to share the results, and the process will probably have me in the forums quite a bit too. .

gchen · 23 January 2023 16:29

I’m interested to know how this goes too. Please let me know if you run into any issues.

Droolio · 24 January 2023 12:39

Probably very wise.

Do you mean the opposite? AFAIK, Duplicacy would kinda need minimal read access - to at least the revision file - when doing an incremental backup. Although maybe you can lockdown individual snapshot dirs for read? Pretty sure you need read to /chunks, though.

When I think of what you’re trying to accomplish, my thoughts would be WORM - write only, read many - i.e. allow writing new files, no append, no delete. I’m guessing that can be done with B2…

Duplicacy also has RSA private-public encryption, so you could isolation the restores to another system (and keep the singular, private key, extremely safe; using individual keys for each user would nullify deduplication), then users couldn’t access other user’s files. Only via the isolated restore box.

Another thing you may need to take into account… and which I just discovered today, while adding a new VM to one of my Vertical Backups storages… Initial backups are quite a pita when you have a lot of existing chunks. Vertical Backup (Duplicacy equivalent for ESXi) doesn’t have much memory, so it’s not able to enumerate all the chunks when doing an initial backup (not needed subsequently). Right now, I’m having to set up an alternative empty storage for this 1 VM, complete a backup, then on another system with enough RAM, copy the storage into the main storage and eventually I’ll re-point the destination. IMO, you’ll run into a similar issue, with large numbers of chunks. You may need a way to transparently switch out the storage for new clients, and scripting cache clearing and wotnot.

Also, I don’t wanna be that guy who points out this isn’t 3-2-1… but you could leverage this setup to replicate a third (or more) off-cloud copy, by pulling data with an isolated system, which would mitigate against ransomware. TBH I’d probably just rsync/Rclone instead of using Duplicacy’s copy at this point.

Sounds like an interesting project, I’m fascinated to know how it turns out…