[solved] SFTP backup does not rename/delete all tmp files

Please describe what you are doing to trigger the bug:
Storage: SFTP
duplicacy backup
duplicacy prune -keep 1:2 -exhaustive

Please describe what you expect to happen (but doesn’t):
prune the snapshots

Please describe what actually happens (the wrong behaviour):
It does prune the snapshots, but additionally I get with every backup and then prune run a huge list of:

Deleted file 00/c8858a6b61a1aa01166f93a47f80d50b0fc19b9a0066a89d4ab9282bb7f72e.hewxsoig.tmp from the storage
Deleted file 00/c8858a6b61a1aa01166f93a47f80d50b0fc19b9a0066a89d4ab9282bb7f72e.geqbzzln.tmp from the storage
Deleted file 00/c8858a6b61a1aa01166f93a47f80d50b0fc19b9a0066a89d4ab9282bb7f72e.ivmlsxgz.tmp from the storage
Deleted file 00/382c4c67f9b0ee1f85f302a913dbcca50127ce2f97b5f42151c75287db55c5.dpancdqd.tmp from the storage
Deleted file 00/c8858a6b61a1aa01166f93a47f80d50b0fc19b9a0066a89d4ab9282bb7f72e.jfmhpxli.tmp from the storage
Deleted file 00/c8858a6b61a1aa01166f93a47f80d50b0fc19b9a0066a89d4ab9282bb7f72e.mmwmypst.tmp from the storage
Deleted file 00/c8858a6b61a1aa01166f93a47f80d50b0fc19b9a0066a89d4ab9282bb7f72e.upktajzb.tmp from the storage

Somehwere I read that backup is creating those tmp files but then renames them – and this is apparently not happening for SFTP (at least for my storage). I’d be happy to support debugging this.

What SFTP server are you using? Is it not a Synology by any chance?

It’s an Odroid HC4 running OMV.

Ok. Still it’s worth trying the same fix: try using absolute path in your SFTP URL.

E.g. sftp://you@odroid.local//share/backup. Note the path starting with //

Can I change this directly in the config file or do I need to do this via the dupublicy cli?

Yes, you can change that in .duplicacy/preferences.

Thank you. I just had a look and it is already configured as absolute path: sftp://user@domain.tld//mnt/hd1

Interesting.

I would then enable verbose logging in duplicacy (-d flag) and review sftp server logs for any failures.

Okay, I did the following: I manually deleted all *.tmp files on the server, to make sure there are none to start with. Then I ran backup, and checked, now there are many .tmp files again. I am running duplicacy with the -d flag, it’s not showing any errors or anything about tmp files. Any ideas for server logs I can check? I checked sshd logs, but that wouldn’t be the right place anyways. As the user can create and delete files, I doubt there will be any logs?

Apparently duplicacy tries to rename the file, but that fails.

Unless you have configured permissions that don’t allow rename, it must be reflected in SFTP logs. It should be either in sshd logs or system logs.

You can run sftp to log into the storage server and try to rename a file there:

sftp user@server

Here is a tutorial: The SFTP rename command: A Comprehensive Guide - SFTPCloud

1 Like

sftp rename works as expected. With that user, in the backup folder I can rename files (and I verified outside of SFTP) without it creating a copy and not deleting the original.

I checked sshd logs and syslog for “.tmp” and “rename” but there is nothing in there.

Do you have any antivirus/indexer/any other software in the NAS that could lock a file briefly right after upload?

Try renaming the file right after uploading, with a script, to better emulate what duplicacy is doing.

Other questions:

  • does just the temp file exist or both target file and temp file do?
  • does it happen always or ocasionally?

I am running bare OMV without any plugins, so I don’t think any software should do any locks, unless it’s built-in to debian, which I don’t think is the case?

Both files exist, but when checking I noticed that file sizes are different, and that worries me:

-rwxrwxrwx 1 root root 7.3M Jan 16 06:41 451ca409b3371f4230602c17bc807d2e81442d1d6842433b8654d090133ae5
-rwxrwxrwx 1 root root 5.0M Jan 16 06:43 451ca409b3371f4230602c17bc807d2e81442d1d6842433b8654d090133ae5.bwkwfwir.tmp

-rwxrwxrwx 1 root root 8.9M Jan 16 06:46 d3efce8d9275aaf72a32cc1e7b5746b9169c22be719b33e3f73af9a652cae2
-rwxrwxrwx 1 root root 4.0M Jan 16 06:43 d3efce8d9275aaf72a32cc1e7b5746b9169c22be719b33e3f73af9a652cae2.autprjog.tmp

I had the impression it’s happening every time, but I tried to run with just some small changes and there it didn’t happen. So it looks like it’s not happening for all chunks, but very frequently.

Looks like this works as designed.

The upload to temporary file first is done to ensure that only complete chunks are stored. If the transfer interrupted for any reason — such as network hiccup, connection lost, the default behavior of the SFTP server, unlike say S3, is to keep partial file. If that was an actual target file that would have been a problem, as Duplicacy expects the chunk file to be always consistent.

In this case, a bunch of transfers have been interrupted, leaving partially uploaded files.

The temp file can be both newer or older than the chunk file depending on the SFTP server behavior on interrupted transfers and whether you backup concurrently.

You can stop duplicacy and delete those.

You can also run check -chunks to confirm that everything is still intact and consistent.

Either way, nothing to worry about.

The problem is that the partial/tmp files are piling up and never get deleted unless I run a prune -exhaustive. So what I don’t understand: It uploads to the .tmp file, the upload gets interrupted, so it leaves the .tmp file. Then it retries, why doesn’t it override the existing tmp file?

It’s inherent to the sftp behavior – if the connection got aborted, there is no way duplicacy could delete the partial file.

Because the temporary file contains a randomized suffix. This is to ensure correct concurrent behaviour: multiple duplicacy instances can be uploading the same chunk at the same time, but only one successful one is allowed to turn into an actual chunk.

Ah thanks a lot for the explanation, now it all makes sense! :slight_smile: Didn’t realize the suffix is randomized. I was searching for an option to turn that behaviour off in the debian sftp server, but couldn’t find any. Also I couldn’t find an alternative sftp server that has turned it off. What I do find is the recommendaton that the uploading client should delete the tmp file. So: It would be nice if duplicacy would delete partially uploaded files that got interrupted or tries to resume on the same partial uploaded files?

It would require each duplicacy client to remember which temp file it uploaded last time, or derive the randomized portion of it from some stable and unique ID. This is quite a bit extra complexity and potential for bugs, and still does not guarantee that these files will be deleted – on the repeated attempt, the chunk with the same ID may not even be generated – all just to workaround misbehaving remotes. Current behaviors is safe and guarantees that you won’t lose data, at the expense of leaving some garbage: it’s better to leave extra file, than delete an important one. In fact, the uploading to the temp file itself is already a workaround for the sftp behaviour.

Instead, since you already know that the connection is flaky, you can schedule a periodic job on your unraid server to do the cleanup. Either with exhaustive prune (i’m actually surprised that exhaustive prune deleted those, it’s pretty nice, and is the supported way to cleanup the datastore), or even something simple, like a cron or systemd job doing something like find -E /path/to/duplicacy/chunks -type f ! -iregex '.*/[0-9a-f]{62}$' -mtime +10d. This will list those extra files. To delete them – add -delete parameter in the end. Adjust {62} to the actual length of the file, if you have non-standard nesting level, otherwise this will nuke your whole backup. Maybe find /path/to/duplicacy/chunks -type f -name "*.tmp" -mtime +10d -delete, to target old tmp files exclusively will be safer approach.

This will delete files that are not in the format of the chunk, and older than 10 days, assuming your backup happens more often than every 10 days, to avoid deleting in-progress files.

I would actually try to get to the bottom of your network connectivity – if this happens a lot – it’s probably something you want to find an underlying reason for. Connection should not be getting interrupting so often that amount of garbage data becomes noticeable.

Another alternative is to switch to a different protocol – for example Minio. This may provide you much better performance, depending on your server configuration.

1 Like

Thank you very much! I’ll try out Minio.