New User: Problems replicating a seemingly simple Duplicati Backup

boskev · 17 February 2021 20:09

Hello, everyone! I have been evaluating Duplicacy for the last week or so as I contemplate switching over from Duplicati. Everything that I’ve asked Duplicacy to do, it seems to do faster than Duplicati – except for this one task, and I feel as if I’m missing something extremely basic. Nonetheless, my frustration is mounting.

(In the interest of full disclosure, I am not a “techie” or command line wizard. Part of the appeal of Duplicati was that it was extremely intuitive, but in the end, corrupt databases and endless Warnings were making me nervous.)

What I’m trying to do is move an entire shared folder – about 20GB worth of data – that is a mounted volume on my iMac to a BackBlaze bucket. This is one of two shared folders I use for work, and the smaller one of the two (about 5GB) indexed and uploaded in about 2 hours initially. That seemed reasonable and was similar performance to what I recall the initial Duplicati upload taking. However, this larger folder is taking 2-3 hours to index, and then Duplicacy estimates needing between 3-4 days to complete the upload! I have a fast Internet connection (1GB down / 25Mbps upload) and I’m not attempting this task via Wi-Fi – my computer is connected via ethernet.

After perusing the forums here, I’ve tried tinkering with the number of threads (1,10, 32), I’ve tried adding the “-hash” option (not sure what it does, but it didn’t seem to make a difference on this particular job), and I’m not sure what to try next.

Duplicacy seems like an excellent tool, but I cannot fathom why a 20GB initial backup would take 3-4 days to complete? Are certain options that would make this a breeze unavailable to me because I haven’t bought a license yet?

Thanks in advance for advice you can provide.

Kevin

arno · 17 February 2021 21:03

Anything special about the type of files? Many small files for example? The 3-4 days may also just be a bad guess as the upload starts. Have you allowed it to run for any amount of time yet?

I use 8 threads; you’d need to experiment, as you have, to see what works for you.
“-hash” tells duplicacy to hash every file instead of just hashing changed files. For an initial backup that shouldn’t matter. For subsequent backups it’ll be faster to not use “-hash”. I run a “-hash” backup once a month as a means of allowing partial chunks to be more effectively pruned.

boskev · 17 February 2021 21:06

Hi, arno, and thank you for your reply.

This mounted drive does have a lot of small files – .pdfs mostly. I have tried to let the job run, and 3-4 days does seem to be the estimated completion time. I got to the 50 hour mark but then I lost connection to the server and the job timed out.

arno · 17 February 2021 21:14

Hmmm, I wouldn’t have expected a lot of small files to slow things down that much. The good thing is that you should be able to resume the backup without having to re-upload anything that has already been processed. It will probably have to rehash the files though as I think duplicacy uses the previous backup to determine what files haven’t been processed yet.

Oh, you could backup incrementally by slowly adding different sections of your backup at a time to the filter file. That might allow you to complete a backup without loosing the connection.

On a related note, you may have lost the connection to B2 because they’ve been having DNS issues that were just recently resolved.

boskev · 17 February 2021 21:49

Incrementally backing up portions of the drive – which contains seven phases of a study – was something that occurred to me today. However, even using that tack, I ran into the same issue around sluggishness. Phases 1, 3, and 4 uploaded fairly quickly – but the other phases, the ones with the most data/subfolders, took hours to index and I just gave up because I could see where this was heading.

I wasn’t aware that if the job failed to complete that I could pick up where I left off. I (foolishly) deleted the uploaded files and then tried it again … and again … and again! I just don’t understand why this should take days to finish.

As a point of comparison, the large virtual machine that I have on my Mac to run Windows, which is currently at about 100GB, dynamically sized to 250GB, took 5 hours for its intial back-up. Duplicati could never get that machine up to BackBlaze – it always stalled around the ⅓ done mark, or, worse, right at the tail end during file verification.

gchen · 17 February 2021 23:41

You can add -d as the global option which will print out a log message when scanning each directory. If it takes 2-3 hours to index then there might be something wrong with the network mounted folder.

boskev · 17 February 2021 23:51

Thank you. I’ll try that and see what happens.

boskev · 19 February 2021 14:19

I thought I would just give everyone an update on how things are progressing.

Shortly, after I started this thread, I tried once again to upload this 20GB mounted volume to BackBlaze. But this time, I did not delete any of the files that had already been uploaded during the previous attempt that was subsequently aborted (due to loss of network connection or some process simply timng out). The estimated completion time was about 2.5 days, but I decided to stick with it. I am now at the 23 hours to go mark, which means that by this time tomorrow, this first full back-up should be complete. I did put in the global -d option and I upped the thread count to 24. Average upload speed seems to be hovering consistently at 1.4-1.5 Mbps.

One thing that occurred to me is that when I joined this lab three years ago, I remember our Principal Investigator having an issue backing up her files. (And “backing up” might have meant simply dragging files on her computer to an external storage device – I’m not sure whether she was using a program to this, to be honest.). Anyway, she was baffled why her 1GB backup would result in only 800MB of actual data on the external drive. She wasn’t using any compression/encryption, and so she was expecting that if 1GB was being transferred that the resulting backup would be the exact same size.

One of our dedicated Linux programmers discovered that Windows 10 doesn’t like filenames that are longer that 255 characters (?), and what the PI didn’t realize is that the file’s path is part of the name – and this is a person who loves her subfolders. For example, to store her manuscripts, you might have to navigate through:

c:\Users\Her Name\Manuscript Ideas\Accepted Publications\High Impact Journals\First Authorships\1990-2000"Doc #001 - Her name et al. - [insert insanely long name of the manuscript title].pdf"

Many of her files exceeded this limit, and the Linux programmer – who’s no longer working with us – wrote a script that she could run to truncate any file longer than 255 characters so that her backup process would yield the results she was expecting.

My question is could files embedded in countless subfolders with very long names be part of the reason that Duplicacy is working so hard indexing and uploading my files? I work in a lab where everyone seems to like to next work into an endless labyrinth of folders and then give the document itself an insanely long name. Could this be a contributing factor?

I did try to restore one of my PI’s very long-named manuscripts, and I was able to do that successfully – so Duplicacy didn’t ignore the file and fail to upload it.

boskev · 20 February 2021 19:48

I just thought I would give a final update on this back-up task.

First, I was typing the size of each of the two mounted volumes that I referenced from memory, and I woefully underestimated the size of the problematic source file. The smaller of my volumes isn’t 5GB it’s 20GB (this is the one that runs fairly effortlessly); and the size of the problematic one that started this thread is close to 300GB, not 20GB! I’m embarrassed that I didn’t catch that, but I still wouldn’t have expected it to take 2-4 days to upload.

After upping the thread count to 24 and putting in the -d option that @gchen recommended, the backup completed early this morning after about 2.5 days. I then put it in my Scheduler to run every Saturday at noon, and this new task just completed after a few hours and according to my log, everything seems in order! And 2.5 hours to check 300GB of data seems perfectly reasonable to me.

A few parting questions, just to clear up a few points of confusion on my end:

I read in another thread that the -skip-no-unique option could be useful to make backups run a bit faster. But I tried this as a local and global option and Duplicacy didn’t recognize it as valid syntax. What did I do wrong? Is this an actual option, or it is something that is being requested as a product enhancement?
Speaking of options, it seems as if in the Scheduler, one can put options on a Backup that are different than the options on the actual Backup definition itself. Shouldn’t they be identical? Are there times when I wouldn’t want them to be?
How do I change my screename? I wasn’t aware that my partial email address would be used as my alias here on the forum.
Finally, I ran a “check” after my latest backup above, and it claims that I have a Missing Block, probably from one of my prior attempts to get this job to run. How do I delete it – or do I even need to?

Thanks to everyone who chimed in to try to help me! I think I’m where I want to be and am ready to migrate from Duplicati to Duplicacy! I just need to have a better understanding of certain concepts like “checking,” “pruning,” etc.

Kevin

boskev · 20 February 2021 20:23

I just fixed bullet #4, the issue with a missing chunk. Apparently, I left a folder in the BackBlaze bucket called “BNS-FDrive-B3Data-B2” back when I was trying to send my FDrive data up in phases (which we coincidentally call B1Data - B7Data. I deleted all of them except the B3 one apparently. Removing that folder and running Check again yielded a clean bill of health!

gchen · 21 February 2021 15:22

-skip-no-unique was proposed but has not been implemented.

On the backup page you can manually run a backup without creating a schedule, and the options there come in handy in this case.

You can change your user name – click the top right icon, click your user name in the pop up menu, then select Preferences.

boskev · 21 February 2021 16:29

@gchen, thank you very much for your feedback, and your reply made perfect sense. But with respect to the username issue, I’m in Preferences, and I can see how to change my Fullname, add a picture, but I don’t see anything that allows me as a Basic User to override kevin_williams as my username. I’ll feel really silly if the answer is something really obvious, but I’m just not seeing it. Can you screenshot what you mean, or email me privately and I’ll give you what I’d like my username to be and you can tinker it behind the scenes? Thanks so much.

gchen · 22 February 2021 03:52

Sorry I just found out that you can’t change the username 3 days after the registration. I can change it for you if you PM me your new username.

andrew.heberle · 27 February 2021 02:14

Just for comparison I have a similarly sized repository (250GB) mounted remotely (via NFS in my case) and my hourly backup takes about a minute (or two)…

To be fair this is a backup to a SFTP target running on my local network, but then a subsequent daily duplicacy copy from there to a remote repository (SFTP sitting on rsync.net) takes about 10 minutes so 2 hours seems excessive for what looks to be a small amount of changed data.

If I had to guess I’d say your mounted volume is adding significant latency that is making duplicacy’s job much slower than it should.

boskev · 27 February 2021 12:04

@andrew.heberle, thank you for your reply. I’m afraid I’m not as tech-savvy as many others here, but is there anything I can reasonably do to overcome this “latency?”