Duplicacy demolishes duplicati and rclone for large backups

archon810 · 8 August 2019 23:27

Hi all,

We have a lot of data to back up (8TB+ and growing) and I’ve recently been looking for a better solution than CrashPlan that we’ve been using that hasn’t been able to keep up at all. It’s quite appalling how bad CrashPlan actually is - and if you think it’s slow backing up, just wait till you have to restore (if you can even get the file listing to load).

I wanted our new backup solution to be cheap, unlimited, and fast.

After doing a bunch of research, I selected Box @ $45/mo for unlimited transfers and storage. Later, I found that Google Drive for Gsuite Business offers the same but for $12/mo (they say you need min 5 users for this, but it’s been working for myself and others with just 1 user). I’m going to use both as that’s only $57/mo - very cheap for unlimited.

Anyway, I started testing with rclone but quickly realized that the 2.5mln files we have (and growing) would take days or even weeks to sync (at least without rclone+restic) because cloud operations are expensive and slow.

I then moved on to duplicati which was much better, but still had a large number of issues and was more confusing that it needs to be. Particularly, while backups were relatively fast (2mln files ~500GB in a day), restores to other hardware in case of a catastrophic loss were extremely slow at best and broken at worst. There are lots of experimental and unstable versions, and the last beta was really buggy and slow too. It’s kind of a mess.

Thankfully, I then found duplicacy on someone’s recommendation. After figuring out how it works, I set it up on the command line (duplicacy web is, unfortunately, still unstable and crashed for me during a restore) and wrote a script to run and log the backups. It’s been running pretty much flawless so far.

But let’s talk about speeds. I posted this comparison on Twitter, but I’ll repost it here.

Some numbers for a ~400GB initial Linux backup with ~2mln files.

rclone (without restic): it’s been 3-4 days and it’s only like 25% done. Unacceptable performance due to file-by-file copying.

duplicati: 24 hours.

duplicacy: 4 hours.

duplicacy is the clear winner.

Update on subsequent incremental backup speed on the same data set with whatever new files got uploaded since yesterday:

rclone: N/A as it never finished the original upload

duplicati: 1 hour 6 minutes

duplicacy: 4 minutes

duplicacy destroys the competition yet again.

So yeah, I’m very excited about duplicacy because it’s robust and very fast both backing up and restoring. I feel much better about a potential disaster recovery.

Thank you, devs!

towerbr · 9 August 2019 00:41

And has another advantage: an open format. If you decide to change your backup provider/storage tomorrow, simply move your existing files. With Crashplan and others you get stuck in a proprietary format.

I’m a very satisfied Rclone user too, but remember: it is basically a synchronization tool, not backup, even with the --backup-dir option.

archon810 · 9 August 2019 01:55

Yeah, but I do mention it because it’s a viable alternative. rclone+restic are supposed to be killer, but duplicacy is essentially that, but in one tool.

Christoph · 20 August 2019 10:28

FWIW, here are the instructions:

archon810 · 20 August 2019 14:43

Yeah, I saw that post, feels very similar to duplicacy.

I’ll be looking into setting it up as a redundant backup going to Box, since duplicacy doesn’t yet support Box natively.

saspus · 21 August 2019 03:59

Just as a side note — there is way to make Crashplan very fast. It boils down to disabling client-side deduplication. If your data is mostly incompressible and unique you don’t benefit from deduplication anyway and it’s a pure waste of resources, logarithmically increasing with the backup set size. It’s like a natural throttle. More here: Optimizing CrashPlan Performance

Duplicacy is still an order of magnitude faster — but at $10/month for another full service backup seems worth it. Personally I too use three — Crashplan, Duplicacy to B2 and synology HyperBackup to another synology.

Arty.R · 26 August 2019 17:17

Crashplan has big mistake - their first initial backup comes with 100 kbps limited speed. I tried to backup about 8Tb data and understand that it could take a very long time.

tedsmith · 26 August 2019 20:06

That’s not the only problem with Crashplan. About two years ago the scans started taking a lot longer, I worked with them for about a year before I gave up. In my system Duplicacy can do the initial dedupe, encryption and backup to a local disk faster than Crashplan can finish one scan. It’s also much faster sending the files to BackBlaze than Crashplan to the Crashplan cloud.

I’ve lost my complete local backups with Crashplan twice when trying to do a restore to a new computer (Crashplan completely deleted the backup tree structure when it was supposedly restoring.) Fortunately I had the files also in the Crashplan cloud so I wasn’t apoplectic. I don’t think it was my fault: I followed their instructions step by step for hooking up the new computer, I did notice a change in their instructions after that.

I put off moving away from Crashplan for a year or so after all that and am sorry I didn’t find Duplicacy command line and BackBlaze sooner.

At least I know exactly what’s going on with Duplicacy at any stage of backing up or restoring…

jojje · 8 January 2020 15:35

I also want to chime in on the praise.

TLDR; Like Crashplan, but better in all regards. Faster. Less resource intensive. More portable. Affording user better data control. And has a CLI!

Have been a customer of Crashplan for years, initially with the Home edition and then forced over to the small business plan when the former plan was sunset. Since the time I received a heads-up from crashplan about Home sunsetting, I’ve been reading up on, and evaluating a lot of options.

Like others in this thread, I tried Duplicati as well, after reading through the design docs for the service. It looked fine on paper. Only after having put the software through its paces, a number of major flaws surfaced.

Horrible restore design; leading to super slow restores during disaster (restore to new machine). The problem is that the software relies on having local index files (gigantic sqlite files) available on the machine. In case of a disk failure, one needs to download the entire friggin backup and store it somewhere locally. Then use a out-of-band (separate) program to re-create those index databases. This process would take days to weeks with a 1TB backup set. Only then would the user be able to start restoring data. A downtime stretching weeks for a rather small data set isn’t really viable.
Very slow backup jobs. It took me 30 minutes to hours for each delta backup run of a 1TB data set. The majority of that time the program was spending on very slow file system scanning.
Very slow file browsing. Opening the backup’s root folders on the program’s restore tab took 10 minutes. Expanding any node (directory) took another 10-15 minutes [1]. This meant that to restore a file 5 directories down would take an hour just to navigate to that file, provided I knew exactly where that file resided (which is never the case). Completely useless for data sets larger than a demo set.
Bugs, bugs and bugs. I ran into a lot of critical bugs, such as VSS failing totally, backup jobs failing without anything helpful in the logs, leaving me to having to dig through the source code on github to try figuring out what the heck could be going on.
Slow bug fixes. Github issues I faced lingered for 6 months before being picked up.
DotNet. This is a subjective issue, since I’m not a Microsoft developer. As the project relies on volunteer work, it took me a lot of effort to fix some of the bugs locally. Personally I’d preferred a language more common in the unix camp (C/++ or golang) since that’d have made it easier to contribute.

All in all, Duplicati isn’t even close to being fit for purpose.

After exploring different options, including various rsync based ones, I felt so depressed about the current state of affairs in the backup space, that I just decided to pony up for a Crashplan business license. Crashplan seemes like the Atlassian of backup services, meaning no-one is very happy with it, but it does provide the feature set people are looking for, at an affordable price.

By chance I recently stumbled across a reddit discussion where Duplicacy was mentioned, and with very low expectation, decided to quickly try it out. The experience with Duplicati meant I knew exactly what to look for with regard to reliability, security and efficiency which made the eval quick. I was super impressed with the CLI. And the Web UI, though very confusing, did work out of the box with my candidate storage providers (S3 and Wasabi). After having read through the excellent guide on the developer’s website, I mostly understood how to navigate the Web UI, and it has now become my main backup agent.

What I really appreciate with Duplicacy is the following:

It’s rediculously fast doing both full and delta backup. I backed my current 500GB data set consisting of a million files or so, in just a couple of minutes [2]. Delta backups are even faster.
Very fast directory scanning. The bottleneck is entirely my disks, so I don’t see how the developer can improve much on this.
Restoring is a snappy process. The Web UI directory browser doesn’t have any of the design faults that Duplicati has, and allows me to interactively navigate through my directory tree as I would expect in a file browser.
Restoring through CLI worked perfectly, and is the option most likely to be the option during disaster recovery [3].
The developer is very active and responsive. I think going with a commercial model allows him to treat this project seriously, and users of the software as customers rather than “hey, if it doesn’t work, pull requests are welcome” mentality that most spare time OSS hobby projects suffer from. A backup solution has to safeguard my data, and as such this is a space where I want someone who is both passionate and financially motivated to maintain the product.
Client side encryption [4]. This is a minimum bar for any candidate product in this space, but I was surprised that so few products/projects offer this fundamental element.

In comparison to Crashplan, which was the reference backup solution, Duplicacy gives me a much faster backup and restore option with more control. It achieves this at a total price that is less than Crashplan [5], given that I have a very modest backup size at present.

Let me conclude with a tip on a great storage backend for Duplicacy; Wasabi (which I mentioned above).
It’s modelled on the defacto standard (S3) API, with its own AWS IAM clone. This means it’s super trivial to setup a policy which allows Duplicacy to read and write to a bucket, but prevent it from being able to delete. This helps against ransomware threats. Every now and then I temporarily add DeleteObject permission and run a manual prune job [6] to reclaim space, in order to save on storage cost. Normally having Delete disabled gives me peace of mind.
The transfer performance is great if you reside in Europe. I use eu-central-1 and get amazing, almost AWS level, transfer speed, for a fraction of the price.

I pair this bulk backup to Wasabi with another copy of the most critical data to S3, since it’s less likely AWS will fold from financial hardship than the startup.
If Wasabi was to go belly up, I’d just copy the files over to S3 and eat the increased cost. Duplicacy plus a managed bucket service has turned out to be the perfect pair for offsite backups, given all dimensions except perhaps for extreme penny pinching [7].

Never thought I’d find a perfect solution in this space, but now having this cross-platform backup product vetted, finally allows me to close the book on my backup struggles. When people ask (companies or startups), I’m finally able to give a confident answer to the question “How should I deal with backups”; Use Duplicacy.

[1]: Everytime a user navigates to a directory node, Duplicati reads the entire index database, performs SQL queries on it, and writes a completely new tens of gigabyte large temporary database. The developer hasn’t explained that design choice, but it makes the restore feature completely useless, even when the indices reside on fast SSDs or RAM disk!
[2]: Am sitting on a symmetric 1 Gbps connection, and am getting about 600 Mbps transfer speed to Wasabi (a bit higher to S3). Crashplan in comparison took a day for the same amount of data.
[3]: Point the duplicacy CLI at the remote storage, enter the decryption secret and voila. Data restored in a few minutes.
[4]: This is actually a point on which I think the software could improve. Steal the great idea from Duplicati where it allows the user to choose between a built-in encryption implementation, or piping through an external GPG program. Since so much can go wrong wrt. security, and particularly cryptographic implementations, I think allowing users to choose an encryption program they trust would help a lot, particularly in the enterprise space, or for people who work with sensitive corporate / government data.
[5]: Including Duplicacy paid license & the cost of storage on Wasabi/S3
[6]: Takes just a few seconds, and then I disable the the delete permission again.
[7]: Depends on how large your backup sets are. For a few hundred gigs, bucket solutions are cheaper than cloud-disks (dropbox/idrive/gdrive/adrive…) but if you start reaching 2TB then the latter may be more cost effective. Though you have other issues to worry about, such as ransomware, backup performance etc. which are none issues with good bucket solutions.

TheBestPessimist · 8 January 2020 16:36

Thank you for the comparison and for telling us how much you like , @jojje !

gchen · 9 January 2020 03:01

Thank you for sharing your experiences and such a great writeup! This should be very helpful to future Duplicacy users.

Christoph · 16 January 2020 01:27

Your post makes me happy, not because you praise Duplicacy (which I have no stakes in) but because I could have told a less sophisticated version of your story: Years of crashplan as the compromise solution, then trying everything else, almost ending up with duplicati (I got them to start their forum) but giving up because I wasn’t even able to articulate half of the issues you summarised so well and switching to duplicacy as the final-and-never-switch-again solution. I am technically not as knowledgeable as you are which is why it feels so good to see that I nevertheless made the right judgements throughout the process.

stan464 · 16 January 2020 10:57

I myself got stung by Duplicati. I liked the WebUI and Configuration steps but as others have mentioned. the “Local Cache” or database is a joke if you have > 100GB of Backups…

Its fine for tiny backups of <100GB and even then its a risk due to the Database sometimes nuking itself beyond repair. This has happened on Windows machines (various) and as an unRAID Docker.

I discovered Duplicacy withh caution just because of my experiences up till that point being so soured:

Duplicati
Crashplan
Backblaze Home

(yes, some of those are Services rather than just clients)

Duplicacy so far! does what it says on the tin, and im happy to be supporting this Project by Purchasing a License, 6 to be precise and maybe a 7th or 8th if 32bit has a release someday.

Hope this helps people who came from the same History i have with other solutions.

Usefulvid · 10 September 2020 15:16

Just found this thread by coincidence.
I am “happy” to hear that I am not alone with my duplicati problems. The program is often recommended and on the first look it just looks fine from the feature side. I was testing it 2 years after my first encounter with it but things were not better. The gui is a mess even for experienced users. Not talking about the problems when you start to backup or restore your files. I thought I was to blame but looks like you had similar issues as I had.

9c03b032a67d611752a1 · 26 September 2020 19:18

@jojje

Am I missing something? I loved Crashplan. It allowed me to distribute all my backups across several machines with a central cloud in the mix. It made it incredibly easy to manage a backup strategy across several clients and servers. Anecdotally, I told their support twice that their product is too cheap and I’d rather pay more than see it get axed.

Fast forward, it got axed. None of their business solutions included the original home client or its concepts and features. Unless I missed something, you cannot compare (current) Crashplan to Duplicacy.