Backups Process Every File on Every Backup

h1d3m3 · 13 November 2023 16:52

I recently migrated my system and data (Unraid to Ubuntu/ZFS) and since the move, my nightly backups have gone from taking minutes to around 6 hours. Initially, I thought it might have been due to permissions or ownership changes that were initially made during the transfer, but it has now been a week where there have been no file changes.

Duplicacy tries to process every file in the filesystem, only to find almost all chunks have been cached, uploading only a small number of changes actually uploaded. From an uploading perspective, only file changes/incrementals are sent, but from OS/data processing perspective, it’s always a full run that causes hours of high IO activity.

My experience on the other system is that when indexing starts, only “changed since last backup” files are considered for processing and therefore only takes minutes due to a small number files having actually changed.

I have considered a few things:

Time difference between the docker instance and the host - they have the same time and timezone
permissions - the container has been run as root and as a less privileged user who owns the files. no change.
mount point - initially the mount point was ro, but was changed to rw. no change.

Things to try

ZFS : for performance reasons, I have set atime=off on the zpool. I would hope that ctime or mtime being on would be sufficient…but maybe not? I could try to backup a non-zfs point to see if the behaviour changes.
Docker - perhaps there is some Docker shenanigans going on, but I have always run Duplicacy from Docker before, not sure what has changed.
I have run “-v -d” on a backup, but didn’t see anything obvious…just millions of lines similar to PACK_START and PACK_END immediately after.
Somehow my include pattern is causing problems?

Are there any settings that would log why a file is considered to be changed and needs further processing? (i.e. does it keep track of file attributes to help manage incrementals?)

Any help with getting to the bottom why Duplicacy thinks every file needs to be processed would be very welcome.

Duplicacy Web Edition 1.7.2 running in Docker on Ubuntu 22

saspus · 13 November 2023 17:55

access time does not matter in determining if the file has changed. Keep if off.

What are all other options to the backup command?

Modification time and size, unless you pass -hash flag to backup.

h1d3m3 · 13 November 2023 18:05

other options

Options: [-log backup -storage main-nas-backups -threads 4 -stats -stats]

Modification time and size, unless you pass -hash

I am 100% sure the files being processed have not been touched. I did a hash initially to ensure the chunks were all in sync with the storage, and it completed fine in around 6 hours. I kind of hoped it would create a full backup and the next run would be an incremental, but the next backup also took 6 hours to complete and identified only a few chunks to upload.

h1d3m3 · 19 November 2023 16:04

I found a solution to the problem. I ended up simply re-building the duplicacy backup config manually without using any of the previous docker app content that had been migrated from the previous system. (i.e. wipe out all docker content related to duplicacy). Once the config was re-built, incrementals now take less than a minute to complete.

I’m not sure why this required a “from scratch” config rebuild and I don’t fully understand what file metadata duplicacy is using to determine if a file has changed, but something about that original config was making it think all files were changed on every run. At least now my backups are working normally again.

saspus · 19 November 2023 17:13

Wow. This is very bizarre.

Duplicacy config is just a text file describing path to source and url to the target.

There is also a folder with cached, mostly metadata, chunks. Maybe some permissions issue - e.g. if user id changed — caused it to see the cache but not being able to use it?

h1d3m3 · 19 November 2023 18:30

Specifically, this is what is in the host docker app dir for duplicacy:

bin
cache
duplicacy.json
filters
keyring
licenses.json
logs
machine-id
restore
settings.json
stats

and I have docker container volumes mounted as /config /logs and /cache.

Yeah, well, I might have spoke too soon and a simple change below somehow now points to a problem. After running incrementals 3 or 4 times with the fast completion times, I ended up changing one setting to the docker container and the subsequent backup runs is now back to taking 6 hours again.

The change was around how the container volume mounts the host directory that is used to read nas data (i.e. the mounted docker volume). It was changed from “rw” (not ideal) to “ro” (ideal). When applying this change. it has the effect of restarting the container with the new changed setting applied, but thats it.

So my suspicion at this point is that either the container itself somehow contains data that affects backup details and was lost when the config change was applied, or there is something about docker/zfs/the mount point that has changed enough to cause duplicacy to think each file has changed.

I am going to let this 6 hour run complete and then run it immediately again. If it indicates it needs another 6 hours again to complete, it’s back to mount the volume as “rw”. If that change doesn’t fix it, its somehow related to the container being rebuilt or something in my config that is being lost when a container is rebuilt.

I’m taking a closer look at saspus/duplicacy-web:mini (hey, thats you ) to see if maybe I missed something config wise. Now that I think about it, I don’t think I was using mini on unraid, but not sure why that would matter at the moment. I am now noticing that the container /cache is mounted to a directory under /config on the host (i.e. on the host: /blah/config/cache). The container has /config and /cache separately and maybe the host should be more like /blah/config and /blah/cache instead of nested.