Duplicacy preparing for backup relatively slow with filter applied

Hi! I’m trying to understand why some of my backups are taking a comparatively long time, so I ran them with
duplicacy -d backup -stats | ‘[%Y-%m-%d %H:%M:%S]’ > /tmp/file
first with a single filter to say I don’t care about Plex log files:

With filter:
[2020-04-13 14:51:06] There are 0 compiled regular expressions stored
[2020-04-13 14:51:06] Loaded 1 include/exclude pattern(s)
[2020-04-13 14:51:06] Pattern: -var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Logs/
[2020-04-13 14:51:06] Listing
[2020-04-13 15:13:04] Packing bin/journalctl
20 minutes 58 seconds to apply the filter

Then without:

[2020-04-13 15:25:14] There are 0 compiled regular expressions stored
[2020-04-13 15:25:14] Loaded 0 include/exclude pattern(s)
[2020-04-13 15:25:14] Listing
[2020-04-13 15:27:51] Packing bin/journalctl
2 Minutes 37 seconds without any filters

As you can see it took an additional 18 minutes 20 seconds to filter out the most specific filter I could create.
Is there a better filter I can use? Is it expected to be such a large variation in time? My intention is to filter at the path level, so if a directory passes the filter all files in it should pass as well. Is there an optimization like that in duplicacy?

Can you run the test again but this time run the backup without the exclude pattern first? The pattern matching with one single pattern should not take this long even if you have millions of files.

1 Like

This is likely a coincidence. Re-run each test multiple times and throw away outliers

As a reference data point I have extensive ruleset, most of which are regex based and the speed is virtually unaffected (1.4M files flies filtered to about 200k in about 90 seconds on an SSD volume):

me@obsidian ~ % time caffeinate -s duplicacy backup
Storage set to sftp://me@[redacted]//Backups/duplicacy
Last backup at revision 1681 found
Indexing /Users/me
Parsing filter file /Users/me/.duplicacy/filters
Loaded 56 include/exclude pattern(s)
Backup for /Users/me at revision 1682 completed
caffeinate -s duplicacy backup  24.71s user 48.58s system 85% cpu 1:26.02 total
alex@obsidian ~ % find . |wc -l
 1406709
me@obsidian ~ % duplicacy list -r 1682 --files |wc -l
 194291

I did test it with and without the filter but I think now that since this is a ZFS filesystem the attributes were still in the ARC and that’s why without a filter was so much faster.

I’m going to have to think about it more as I really want a backup-solution that either has an FSWatcher built in or can be integrated with one.