I did some testing, and I believe that the answer is yes: with Wildcard Matching, the entire path must be specified, but in Regular Expression Matching, the expression only needs to match part of the path. If you want a regular expression to only match the end of a path, put $ at the end of the regular expression (or /$ in the case of directories).
This is correct, and mostly because the only regex matching function in Go, regexp.Match, behaves more like Search in other languages (I had been confused by this for a while). So if you need to match the entire path then the anchors ^ and $ are required.
I like to back up my entire Linux root volume, in the hopes I can later (a) restore it fully and (b) see changes to any areas by diffing backups. This should work, right?
Has anyone made a filter file for this purpose, excluding those folders that don’t need backup up, such as /dev, /proc and log dirs?
Update
Here’s my go at it:
# This file is /.duplicacy/filters
# exclude any ".cache" folders
e:(?i)/\.cache/.*
# everything in root dir, excluding symlinks, kernel files and temporary stuff
-bin
-boot/
-dev/
-.duplicacy/
+etc/
+home/
-initrd.*
-installimage*
-lib
-lib32
-lib64
-libx32
-lost+found/
-media/
-mnt/
+opt/
-proc/
+root/
-run/
-sbin
+srv/
-sys/
-tmp/
+usr/
+var/
-vmlinuz*
I’m using Duplicacy Web and I don’t understand where the filters file goes. I don’t have a .duplicacy folder. I have .duplicacy-web in my home directory. I created ~/.duplicacy-web/filters and it’s not used. I then created ~/.duplicacy-web/repositories/localhost/0/.duplicacy/filters and that file is not used and also gets deleted somehow when I run a backup. Also tried ~/.duplicacy-web/repositories/localhost/all/.duplicacy/filters and same thing.
I also tried it through the GUI but upon saving, it says
Failed to write the pattern file: open /Users/matt/.duplicacy-web/filters/localhost/0: not a directory
I suggest the addition of the Thunderbird file global-messages-db.sqlite in the exclude list.
Duplicacy usually throws an error on it because it was partially blocked by another process and this makes the entire backup fail. For example:
=> 2020-11-17 08:16:12.213 ERROR CHUNK_MAKER Failed to read 0 bytes: read \?\D:\Tools\Duplicacy\diskstation\users\fbrev\AppData\Roaming\Thunderbird\Profiles\sv6rhc0t.default\global-messages-db.sqlite: O processo n?�o pode acessar o arquivo porque outro processo bloqueou parte do arquivo.
We don’t really need to backup this file anyway, since it is only an index that is recreated if not found.
I’m not a paid user so I have only backed up from the command line, but in case it helps I have successfully used a filters file in Windows in that situation.
I notice that the web ui does in fact generate a .duplicacy folder for each repository. They’re in .duplicacy-web\repositories\localhost\all. I speculate that a filters file there might work, but it would cost me money to find out for sure.
Use the Web GUI to create the filters file for each backup job in the first place. It’s under the Backup tab, click the Edit link under Include/Exclude column for a backup.
You can exclusively use the GUI to add, delete and rearrange lines, so you technically don’t need a text editor, although you can edit the file after it’s been created, and it’ll be in the numbered repository directory - i.e. .duplicacy-web\repositories\localhost\0\.duplicacy\filters (where 0 is the order in which backup tasks were created).
To add to this, you can use duplicacy’s GUI editor (that is really awkward to use in the first place, compared to a plain old text editor) to create a filter file that contains just a single line: import contents of another file. And that other file can be elsewhere. I like to store mine in the Documents folder, that gets replicated everywhere, can be edited from anywhere, and is available to all duplicacy installs.
This does not match my experience, given a list of patterns as follow:
...(more patterns, some regex, some not)
i:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/(Archive|Drafts|Folders|INBOX|Sent|Spam|Trash).*
e:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/.*
...(more patterns, some regex, some not)
It does not stop at the i: pattern, but actually gives the e: pattern priority, and nothing is backed up from the 127.0.0.1 directory, even though the i: pattern appears first, and does match some files in the 127.0.0.1 directory.
2023-05-24 09:28:55.329 TRACE SNAPSHOT_PATTERN Pattern: i:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/(Archive|Drafts|Folders|INBOX|Sent|Spam|Trash).*
2023-05-24 09:28:55.329 TRACE SNAPSHOT_PATTERN Pattern: e:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/.*
2023-05-24 09:29:00.581 DEBUG PATTERN_INCLUDE .thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1.msf is included
2023-05-24 09:29:00.581 DEBUG PATTERN_EXCLUDE .thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/ is excluded by pattern e:.thunderbird/6wxt3a2w.default/ImapMail/127.0.0.1/.*
None of the examples clearly show how to do what I want to do (ie i: and e: both match a path, and the first matching pattern should win). This might require clarifying what you mean with the path is said to be included without further comparisons (in a regex context).
I want to exclude one folder, but include one subfolder within that excluded folder – is that possible?
I currently have only exclude filters, eg. one for the new case:
-e/dontbackup/
Now I thought I could include a subfolder like this:
The key to make sense of it is that duplicacy tries to match every line one by one and if as result a specific directory turns out to be “don’t include” it won’t go inside of it, and therefore will never check anything inside of it.
So you force it to include all the intermediate paths, for it to actually see a sub-paths matched for inclusion.
Honestly, duplciacy shall be smart enough, to figure out that if I say +a/b/c/d/e/ that means it has to revisit each sub-path and not force the user to write it out explicitly. But it is not the case today.