Filters/Include exclude patterns

For the backup command, the include/exclude patterns are read from a file named filters under the .duplicacy directory. For the restore command, the include/exclude patterns are specified as the command line arguments.

Duplicacy offers two different methods for providing include/exclude filters, wildcard matching and regular expression matching. You may use one method exclusively or you may combine them as you deem necessary.

All paths are relative to the repository (the folder you execute duplicacy from), without a leading “/”. As the upmost folder on Windows is a drive, this means drive letters are not part of the path of a pattern. The path separator is always a “/”, even on Windows. Paths are case sensitive.

Wildcard Matching

An include pattern starts with “+”, and an exclude pattern starts with “-”. Patterns may contain wildcard characters “*” which matches a path string of any length, and “?” matches
a single character. Note that both “*” and “?” will match any character including the path separator “/”.

When matching a path against a list of patterns, the path is compared with the part after “+” or “-”, one pattern at a time. Therefore, the order of the patterns is significant. If a match with an include pattern is found, the path is said to be included without further comparisons. If a match with an exclude pattern is found, the path is said to be excluded without further comparison. If a match is not found, the path will be excluded if all patterns are include patterns, but included otherwise.

Patterns ending with a “/” apply to directories only, and patterns not ending with a “/” apply to files only.

Patterns ending with “*” and “?”, however, apply to both directories and files. When a directory is excluded, all files and subdirectories under it will also be excluded. Therefore, to include a subdirectory, all parent directories must be explicitly included.

This is because duplicacy considers all matches and exclusions at each level in the file tree before descending into the next level (a “breadth first” search for matches). This makes the search much more efficient, but can produce confusion about the order an interpretation of filter rules.

For instance, the following pattern list doesn’t do what is intended, since the foo directory will be excluded so the foo/bar will never be visited:

+foo/bar/*
-*

This does not work because
-*
implies
-foo/

So when duplicacy examines the first level of the file tree for matches and exclusions, it excludes foo/ and everything underneath. That means that it never goes to the second level into foo/, and therefore never sees a match for foo/bar/. It also excludes all other top-level directories, producing an empty backup.

So, we have to make sure foo/ is included first, before the wildcard excludes it. Here is the correct way to include foo as well:

+foo/bar/*
+foo/
-*

The following pattern list includes only files under the directory foo/ but not files under the subdirectory foo/bar:

-foo/bar/
+foo/*
-*

To include a directory while excluding all files under that directory, use these patterns:

+cache/
-cache/?*

To include files in a directory and exclude its subdirectories:

-folder/*/*
+folder/*
-*

Regular Expression Matching

An include pattern starts with “i:”, and exclude pattern starts with “e:”. The part of the filter after the include/exclude prefix must be a valid regular expression. The
regular expression syntax is the same general syntax used by Perl, Python, and other languages.
Full details for the supported regular expression syntax and features are available here.

When matching a path against a list of patterns, the path is compared with the part after “i:” or “e:” one pattern at a time. Therefore, the order of the patterns is significant. If a match with an include pattern is found, the path is said to be included without further comparisons. If a match with an exclude pattern is found, the path is said to be excluded without further comparison. If a match is not found, the path will be excluded if all patterns are include patterns, but included otherwise.

Some examples of regular expression filters are shown below:

# always include sqlite databases
i:\.sqlite$
# exclude sqlite temp files
e:\.sqlite-.*$
# exclude temporary file names
e:.*/?~.*$
# exclude common file types (case insensitive)
e:(?i)\.(bak|mp4|mkv|o|obj|old|tmp)$
# exclude lotus notes full text directories
e:\.ft/.*$
# exclude any cache files/directories with cache in the name (case insensitive)
e:(?i).*cache.*
# exclude lightroom previews
e:(?i).* Previews\.lrdata/.*$
# exclude Qt source
e:(?i)develop/qt[0-9]/.*$
# exclude any git stuff
e:\.git/.*$
# exclude cisco anyconnect log files: matches .cisco/log/* or .cisco/vpn/log/*, etc
e:\.cisco/.*/?log/.*$
# exclude trash bin stuff
e:\.Trash/.*$
# exclude old firefox stuff
e:Old Firefox Data/.*$
# exclude dirx stuff: excludes Documents/dir0/*, Documents/dir1/*, ...
e:Documents/dir[0-9]*/.*$
# exclude downloads
e:Downloads/.*$
# exclude duplicacy test stuff
e:DUPLICACY_TEST_ZONE/.*$
# exclude lotus notes stuff
e:Library/Application Support/IBM Notes Data/.*$
# exclude mobile backup stuff
e:Library/Application Support/MobileSync/Backup/.*$
# exclude movies
e:Movies/.*$
# exclude itunes stuff
e:Music/iTunes/iTunes Media/.*$
# include everything else
i:.*
# include Firefox profile but nothing else from Mozilla
i:(?i)/AppData/[^/]+/Mozilla/$
i:(?i)/AppData/[^/]+/Mozilla/Firefox/
e:(?i)/AppData/[^/]+/Mozilla/

Explanation of the regex above:

  • /[^/]+/: has the purpose of assuring that there is exactly 1 folder between AppData and Mozilla
  • we need to include
    • the Mozilla folder, but nothing it contains (therefore the $)
    • the Firefox folder, and everything it contains
    • exclude everything in the Mozilla folder which is not contained in the rules above
    • (important) put the $ include rule(s) for each folder we want to include up to the actual folder where we take everything, (check Google Chrome profile below). (note: someone please explain this better)
# include Google Chrome profile but nothing else from Google
# note that we include the whole profile, because we are unsure how many "users" are added beside the "Default" profile
i:(?i)/AppData/[^/]+/Google/$
i:(?i)/AppData/[^/]+/Google/Chrome/$
i:(?i)/AppData/[^/]+/Google/Chrome/User Data/
e:(?i)/AppData/[^/]+/Google/

As seen in the examples above, you may add comments to your filters file by starting the line with a “#” as the first character of the line.
The entire comment line will be ignored and can be used to document the meaning of your include/exclude wildcard and regular expression filters. Completely blank lines are
also ignored and may be used to make your filters list more readable. Note that if you add # anywhere else but at the beginning of a line, it will be interpreted as part of the pattern, not as a comment.

Testing filters

Filters can be easily tested using the backup command: duplicacy -d -log backup -enum-only. This is further explained in Backup command details.

Importing patterns from other files

You can now @import other files into the filters file by using

@/the/full/path/to/the/customised-filters-file
@/the/full/path/to/the/some-other-filters-file

other filters below

See the details in Filters just got a big upgrade: @import files.

Custom filters file location

Start with version 2.3.0, you can now specify the location of the filters file rather that the default one at .duplicacy/filters. To do this, run the set command as:

set -storage second -filters <path>

The path can be an absolute path, or relative to the repository path.

You can also edit the .duplicacy/preferences file directly to add the filters key.

This option means that you can now use a different filters file for each storage.

5 Likes
Backup command details
Filters just got a big upgrade: @import files
Filters for files in subdirectories
Need to be able to select only specific folders in the repository to back up
Filters on a share
Duplicacy - exclude files video tutorial
Restore command details
Which folders in ProgramData and AppData should be backed up?
Patterns for exclusion/inclusion are confusing
Considering Duplicacy with Wasabi
How to run duplicacy as a cron job on linux?
Am I doing this right?
Improving the instructions for include/exclude patterns (filters)
Include patterns for symbolic links
Include patterns with wildcards
Backing up serval directories at once
Include a folder for backup in Mac user library
Backup from specific directory without cd
Reducing memory usage deciding what to back up
Proposal: following symlinks by pattern
Why does it fail to list folder - when it's excluded?
Why does it fail to list folder - when it's excluded?
Some newbie questions - backup organisation
Some newbie questions - backup organisation
How to properly do an initial backup to B2?
Exlusions to backup not working
Duplicacy using 3x storage of source for backup
Filter pattern to include only one type of file in several subfolders
How to initiate a backup for multiple repositories?
Revisiting filters - no love for me
Patterns for exclusion/inclusion are confusing
Patterns for exclusion/inclusion are confusing
Let me limit the folders that are checked during a backup
Duplicacy User Guide
Files-from -- filter by mtime size
Backing on External Drive w/ Time Machine Hidden Directories
Windows Backup Failed to read the symlink
Windows version - exclude common system folders automatically?
Exclude/include Filters Question
Filters Sanity Check
Little help with include/exclude filter choices on unRAID please :(
Exclude Specific File Extensions, Under A Specific Folders
Multiples Drive to Same Backup ID?
Exclude rules help
Backup Windows User Folder - Re-open
Restore Multiple Files - Web UI
Restore Blocked by Symlink error
Backup job reportng no files to backup
False alarm for non-regular files?
Multiple uploads into same revision / merge revisions
Include/exclude help
Edit pattern list in a text editor
Symbolic link repository and filters
Restore files in a folder but not its subfolder
Restore files in a folder but not its subfolder
Restore files in a folder but not its subfolder
Filters key in preferences file
Help with filters - exclude directory everywhere
Filters and patterns - include specific folders

Do I understand correctly that with Wildcard Matching, the entire path must be specified? For example:

/home/joe/foo/bar the directory I want to exclude
-/bar/ wrong
-*/bar/ right

But in Regular Expression Matching, the expression only needs to match part of the path? For example:

/home/joe/foo/bar the directory I want to exclude
e:/bar/$ right
e^.*/bar/$ also right

(also, is /$ at the end the regular expression the correct way to exclude a directory?)

A post was split to a new topic: Patterns for exclusion are confusing

I did some testing, and I believe that the answer is yes: with Wildcard Matching, the entire path must be specified, but in Regular Expression Matching, the expression only needs to match part of the path. If you want a regular expression to only match the end of a path, put $ at the end of the regular expression (or /$ in the case of directories).

3 Likes

This is correct, and mostly because the only regex matching function in Go, regexp.Match, behaves more like Search in other languages (I had been confused by this for a while). So if you need to match the entire path then the anchors ^ and $ are required.

And yes, /$ at the end matches directories only.

4 Likes

4 posts were split to a new topic: How to restore files after a certain date

2 posts were merged into an existing topic: How to restore files after a certain date

I like to back up my entire Linux root volume, in the hopes I can later (a) restore it fully and (b) see changes to any areas by diffing backups. This should work, right?

Has anyone made a filter file for this purpose, excluding those folders that don’t need backup up, such as /dev, /proc and log dirs?

Update

Here’s my go at it:

# This file is /.duplicacy/filters

# exclude any ".cache" folders
e:(?i)/\.cache/.*

# everything in root dir, excluding symlinks, kernel files and temporary stuff
-bin
-boot/
-dev/
-.duplicacy/
+etc/
+home/
-initrd.*
-installimage*
-lib
-lib32
-lib64
-libx32
-lost+found/
-media/
-mnt/
+opt/
-proc/
+root/
-run/
-sbin
+srv/
-sys/
-tmp/
+usr/
+var/
-vmlinuz*

I’m using Duplicacy Web and I don’t understand where the filters file goes. I don’t have a .duplicacy folder. I have .duplicacy-web in my home directory. I created ~/.duplicacy-web/filters and it’s not used. I then created ~/.duplicacy-web/repositories/localhost/0/.duplicacy/filters and that file is not used and also gets deleted somehow when I run a backup. Also tried ~/.duplicacy-web/repositories/localhost/all/.duplicacy/filters and same thing.

I also tried it through the GUI but upon saving, it says

Failed to write the pattern file: open /Users/matt/.duplicacy-web/filters/localhost/0: not a directory

Thanks for your help.

Doesn’t this exclude every file with a ‘~’ in it? Should this instead be: e:^(.*/)?~.*

2 Likes

I suggest the addition of the Thunderbird file global-messages-db.sqlite in the exclude list.

Duplicacy usually throws an error on it because it was partially blocked by another process and this makes the entire backup fail. For example:

=> 2020-11-17 08:16:12.213 ERROR CHUNK_MAKER Failed to read 0 bytes: read \?\D:\Tools\Duplicacy\diskstation\users\fbrev\AppData\Roaming\Thunderbird\Profiles\sv6rhc0t.default\global-messages-db.sqlite: O processo n?�o pode acessar o arquivo porque outro processo bloqueou parte do arquivo.

We don’t really need to backup this file anyway, since it is only an index that is recreated if not found.

Welcome do the forum, @fbreve!

Yes, it is not necessary. And I also exclude all the msf files and caches.

I have no issue backing up my Thunderbird profile while in use, when using the -vss option flag.

I’m having the same problem. The documentation is sorely missing :frowning:

I have no idea where to put the filters file if you use the web ui

I’m not a paid user so I have only backed up from the command line, but in case it helps I have successfully used a filters file in Windows in that situation.

I notice that the web ui does in fact generate a .duplicacy folder for each repository. They’re in .duplicacy-web\repositories\localhost\all. I speculate that a filters file there might work, but it would cost me money to find out for sure.

Use the Web GUI to create the filters file for each backup job in the first place. It’s under the Backup tab, click the Edit link under Include/Exclude column for a backup.

You can exclusively use the GUI to add, delete and rearrange lines, so you technically don’t need a text editor, although you can edit the file after it’s been created, and it’ll be in the numbered repository directory - i.e. .duplicacy-web\repositories\localhost\0\.duplicacy\filters (where 0 is the order in which backup tasks were created).

To add to this, you can use duplicacy’s GUI editor (that is really awkward to use in the first place, compared to a plain old text editor) to create a filter file that contains just a single line: import contents of another file. And that other file can be elsewhere. I like to store mine in the Documents folder, that gets replicated everywhere, can be edited from anywhere, and is available to all duplicacy installs.

In the section about regex matching, you write

This does not match my experience, given a list of patterns as follow:

...(more patterns, some regex, some not)
i:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/(Archive|Drafts|Folders|INBOX|Sent|Spam|Trash).*
e:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/.*
...(more patterns, some regex, some not)

It does not stop at the i: pattern, but actually gives the e: pattern priority, and nothing is backed up from the 127.0.0.1 directory, even though the i: pattern appears first, and does match some files in the 127.0.0.1 directory.

2023-05-24 09:28:55.329 TRACE SNAPSHOT_PATTERN Pattern: i:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/(Archive|Drafts|Folders|INBOX|Sent|Spam|Trash).*
2023-05-24 09:28:55.329 TRACE SNAPSHOT_PATTERN Pattern: e:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/.*
2023-05-24 09:29:00.581 DEBUG PATTERN_INCLUDE .thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1.msf is included
2023-05-24 09:29:00.581 DEBUG PATTERN_EXCLUDE .thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/ is excluded by pattern e:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/.*

None of the examples clearly show how to do what I want to do (ie i: and e: both match a path, and the first matching pattern should win). This might require clarifying what you mean with the path is said to be included without further comparisons (in a regex context).

I want to exclude one folder, but include one subfolder within that excluded folder – is that possible?
I currently have only exclude filters, eg. one for the new case:

-e/dontbackup/

Now I thought I could include a subfolder like this:

+e/dontbackup/pleasebackupthis/
-e/dontbackup/

But that doesn’t work?
Also this doesn’t help:

+e/
+e/dontbackup/pleasebackupthis/
-e/dontbackup/

Also no luck with:

+e/dontbackup/pleasebackupthis/*
-e/dontbackup/

Ha, I finally got it:

+e/
+e/dontbackup/
+e/dontbackup/pleasebackupthis/*
-e/dontbackup/*

I found it confusing that I had to incude the folder and then later on exclude all files and subfolders – but it makes sense.

3 Likes