Chunk upload consistency question

Hi all,

I have been monitoring the file access logs for three different backup softwares, Duplicacy, Arq and Kopia.

I noticed that Arq and Kopia always uploads files to a temporary name (Kopia) or a temporary folder (Arq) before moving it to the proper place after upload has completed.
Duplicacy on the other hand seems to upload directly to it’s final place.

Is there some specific reason Arq and Kopia needs to do this but Duplicacy does not?

And how does Duplicacy guard against a partial upload remaining in place and future backups assuming the chunk is valid and allowing it to be deduplicated?

Thanks!
-Alex

Duplicacy does the same upload/rename on some backends, that don’t guarantee the transfer atomicity, such as sftp.

Other remotes do provide a way to validate uploads by checksums. Duplicacy sends a checksum along with the payload. So if the transfer is interrupted, the checksum won’t match, and the remove will discard the partial file.

Not renaming the files allows you to harden security of your backup by providing duplicacy with limited access keys – for example that only allow create, but not modify or delete objects.

Kopia and Arq perhaps chose the blanket approach to avoid customizing each remote and thus simplify maintenance.

Which remote were you comparing access with?

Thank you for a detailed answer.

I believe I understand the logic. And if uploads are guaranteed to be either all or nothing it makes total sense. The blanket approach explanation also makes sense.

I observed the logs from rclone serving sftp (Arq) and webdav (Duplicacy and Kopia).

Thanks!

1 Like

I would warn against using WebDAV for any non-trivial amount of data. The protocol was designed for document exchange, a few files in a folder. The workload duplicacy is exerting is vastly different: hundreds of thousands of chunks in hundreds of folders. You will be seeing stability and performance issues as your backup set grows. This is not specific to duplicacy.

Treat WebDAV as a last resort or only use it for very small backups. If your storage provider however only supports WebDAV — it’s likely ill-suited to serve as a backup destination to begin with. Any issues that result from backup program usage pattern will land on supports deaf ears. You can search this and other backup programs forums for WebDAV. This too is not specific to duplicacy.

It looks Arq made a smart decision of not supporting WebDAV to begin with. My personal opinion always was that duplicacy and other backup tools shall drop this altogether. Not just WebDAV but all other document exchange oriented endpoints like OneDrive, DropBox, pcloud. But I’m just another user, it depends on a balance between attracting users that can only b backup to WebDAV vs support volume when something inevitably goes wrong.