SFTP check command is very slow

Droolio · 2 August 2018 14:03

Was about to make a post about the exact same problem which I encountered a few days ago. In testing Vertical Backup and trying to establish a good retention period for prune, I ran out of disk space. Oops

Running out of disk space for the VB/Duplicacy storage is a bad idea!

Firstly, it created a 0 byte snapshot file, which I had to delete manually before even a check could run (it definitely didn’t like the 0 byte file).

Running prune, especially with -exclusive option, on revisions which has missing chunks - creates more missing chunks - because Duplicacy stops with an error when it encounters a single problem, and is unaware it already deleted a whole lot of chunks. The snapshot file remains.

Remedy: manual deletion of the affected snapshot files and cleanup with -exhaustive.

One other issue I encountered is that check also stops when it encounters a single error, such as a missing chunk. Which makes it tiresome if you want a complete list of broken snapshots. (In my case, I have a performance bottleneck I’m still trying to get to the bottom of, that means running check takes practically half a day, just to list all chunks!)

Suggestion: Duplicacy should first rename the snapshot to say <revision number>.del - just like it does for the chunks into fossils (.fsl) - then delete the chunks. On subsequent runs, it can detect and treat such unfinished snapshots specially, and not stop and error out when it finds chunks that are already deleted.

Similarly, check is a read-only operation (unless using the -resurrect option). It would be nice if it didn’t abort once it found the first missing chunk - it should carry on and give a complete report.

gchen · 2 August 2018 16:51

Which storage backend are you using? For disk and sftp storages, we always upload to a temporary file first and then rename it so I don’t know how it would create 0 byte snapshot files.

Check the nesting level of the chunks directory on your storage. If there are 2 levels then it will take a long time to list 65,536 directories, especially for sftp.

This has long been on my todo list.

saspus · 2 August 2018 16:58

Not op, but I do see quite slow “check” execution over sftp on the nearby server, but it is far from being a day - it’s about 5 min, when the backup itself takes under a minute. The number of files and folders is similar:

myserver:dpc me$ find chunks -type d | wc -l
   58301
myserver:dpc me$ find chunks -type f | wc -l
   82613

Chunks folder looks like this:

myserver:dpc me$ find chunks -type d
chunks
chunks/61
chunks/61/61
chunks/61/0d
chunks/61/95
chunks/61/59
chunks/61/92
^C

Are these “two levels” you are referring to? If so, shall I nuke the repository and start over? (assuming the new version does it differently?

Edit. Confirmed myself. New storage created single layer of chunks

myserver:dpc02 me    $ find chunks -type d
chunks
chunks/61
chunks/0d
chunks/95
chunks/59
^C

Not to start over – would it suffice to just rename chunks with pre-pended second layer folder name?

i.e. ‘chunks/2e/02/7f503fc717b57958ca2b4ec3ad023b51bd55afbfd1513ea97e099b14fa13’ to become
‘chunks/2e/027f503fc717b57958ca2b4ec3ad023b51bd55afbfd1513ea97e099b14fa13’ or is it a terrible idea and there are other dependencies, e.g. explicit path names in the snapshot files?

Droolio · 2 August 2018 18:59

For this particular Vertical Backup setup that we configured for a client (I don’t have such issues with Duplicacy on my home Windows-based setup)…

It’s a new Debian Stretch VM - quite stripped down, no desktop environment - with OpenSSH as the SFTP. Storage is attached to an external 2TB WD Blue in a USB 3.0 docking bay and the ESXi host has a USB 3.0 card that’s been passed through to the VM. Backing up 3 VMs on the same host.

However, the filesystem is NTFS and mounted with ntfs-3g, which I strongly suspect is my bottleneck. Very soon I plan to change to ext4 but will need some spare space.

Backup speeds are quite good though, at under 2 hours, which is pretty good considering it’s totalling nearly 950GB! With around 30-50GB incremental data each day, or maybe 10-15GB each run every 6 hours.

It just struggles if I try to run Duplicacy directly on the Debian VM, with check or prune -exhaustive - anything that does a ListAllFiles().

Just 1 level. With the -v switch, it’ll go through each level, starting from ff, fe, down to 00 - taking several minutes each directory.

Listing chunks/
Listing chunks/ff/
[many hours later]
Listing chunks/00/

saspus · 2 August 2018 19:09

What is sshd CPU utilization on your server and what is your server - NAS or proper machine?

I’m hosting backup on a 8-core MacPro and sshd consumes 40% CPU when check is running. Perhaps it’s CPU limited on a server in your case?

Droolio · 2 August 2018 19:55

Host is a Dell T630 IIRC - nothing fancy, but shouldn’t under-perform this badly. I might be leading us up the garden path by mentioning ssh/sftp - that’s just the way Vertical Backup on the host accesses the storage - backups are running fine, though.

I run Duplicacy from a ‘dummy’ folder, directly on the storage backend, to prune, check and copy to a remote storage. All manually for now - these issues have stopped me from setting up a proper cron job.

I reckon it’s the ntfs-3g mount, although that only uses 10% CPU tops. Need to do a lot more testing before I can find the culprit.