Improving the instructions for include/exclude patterns (filters)

wikify
filters

#1

I think the instructions for the include/exclude patterns could be improved, but I am myself too unsure of things to make any edits in the wiki. So I’m starting this thread to clarify some things first and once I (or anyone else) has a clear view of things, I/they can edit the wiki.

So, let’s start with the first example under “1. Wildcard matching”:

+foo/bar/*
-*

It is introduced by saying

the following pattern list doesn’t do what is intended, since the foo directory will be excluded so the foo/bar will never be visited

It remains unclear what actually is intended, but it seems that the aim is to include only foo/bar/* (I don’t understand what difference it makes to say foo/bar/* instead of foo/bar/). We are then told that the reason it won’t work (I assume that means: nothing at all will be backed up) is that “the foo directory will be excluded so that foo/bar will never be visited”.

But why is this so? Further up, we are told that

the order of the patterns is significant. If a match with an include pattern is found, the path is said to be included without further comparisons.

If that is the case, then that means, the first thing that duplicacy finds in the filter file is +foo/bar/*, which means, the first thing it will do is to include that path “without further comparisons”. Only then will it find the -* and exclude everything, i.e. everything else. So, it is not clear to me why “the foo directory will be excluded so that foo/bar will never be visited”.

Anyway, if I accept that there is a problem with the first example, I nevertheless don’t understand why the second example is the (best) solution:

+foo/bar/*
+foo/
-*

Looking at that example, it strikes me that the first and last line end with an asterisk but not the second one. My hunch is that this is because we only want to include the foo directory, but not everything in it (though it sounds strange to say you want a directory but not its contents). But if the idea is that foo needs to be included so that duplicacy will foo/bar at all, then why does +foo not come first, i.e. before +foo/bar/*?

Ironically, the third example makes sense to me:

-foo/bar/
+foo/*
-*

Perhaps it’s because the only thing we are told about that example is that it “includes only files under the directory foo/ but not files under the subdirectory foo/bar”, which means there is not so much room for discovering incoherences…

The way I interpret each line is like this:

Line 1 -foo/bar/: we exclude foo/bar/ before including anything else, because whatever comes first tales priority over what comes later. So if foo would first be included first, it would be “included without further comparisons”, which means that we would not be able to exclude any of its subdirectories.

Line 2: +foo/* Now that we have made sure that foo/bar is excluded once and for all, we can include foo with all its contents (except the ones excluded earlier)

Line 3: -* Finally, we exclude “everything”, which means “everything except for what has previously been explicitly included” so that we get a backup of everything in foo/, except foo/bar/.

I’d love to hear where I’m wrong because if I continue building my filter file with my current mindset, I’d probably have to completely rework it once I learn how things really work…


Patterns for exclusion/inclusion are confusing
#2

Duplicacy borrows the include/exclude model from rsync. First, the indexing starts at the root of the repository and do a non-recursive listing. Every file or folder is matched against the patterns in order. If an excluded pattern is matched, the file or folder will not be backed up. If it is a folder, Duplicacy will no descend into the excluded folder to list its files/subdirectories. Therefore, it is possible to exclude an entire subdirectory tree at once without wasting time checking ever file/folder under it.

Note that when indexing a subdirectory, what is being matches agains the patterns is the path relative to the repository root.


Filter Patterns - Slashes and Root Items
Include patterns for symbolic links
#3

I’m afraid I still don’t follow. Could you give a step by step description of how the matching process works? Like: duplicacy finds the file foo/bar/file.txt, what does it do next to decide whether it should be backed up?


#4

Assuming foo/bar/file.txt is the only file you want to back up, the correct patterns are:

+foo/bar/file.txt
+foo/bar/
+foo/
-*

Duplicacy lists the root of the repository. The foo/ folder matches the third pattern, while all others will match the fourth one and thus get excluded. In the next step Duplicacy lists the foo/ folder and only foo/bar/ will be included. Finally, Duplicacy lists foo/bar/ and finds foo/bar/file.txt which is the only file with a matching include pattern while all others will be excluded by -*. Overall, Duplicacy only needs to list 3 directories to locate the only file to be included without iterating through the entire directory tree.


Patterns for exclusion/inclusion are confusing
#5

Okay, let me try to make the process even more explicit:

  1. Duplicacy lists the root of the repository.
  2. D takes the first item on the list (let’s say it’s bar/foo) and compares it to the first pattern -> doesn’t match
  3. D compares it to the second pattern -> doesn’t match
  4. D compares it to the third pattern -> doesn’t match
  5. D compares it to the third pattern -> matches!
  6. Since the matching pattern is an exclude pattern, D ignores bar/foo (i.e. does not back it up, and - important in the present context: does not list its contents either).
  7. D proceeds in the same way with every item in the list until it reaches the foo/ directory:
  8. D compares it to the first pattern -> doesn’t match
  9. D compares it to the second pattern -> doesn’t match
  10. D compares it to the third pattern -> bingo!
  11. Since the matching pattern is an “include folder” pattern (and not an include-folder-with-all-its-contents pattern, which would be +foo/*, it lists the contents of the folder and proceeds as above by comparing the each item with each pattern. And so on.

Right?

Bonus question: if the third pattern had been +foo/* D would have acted differently in step 11. It would simply have backed up all the content in foo/, including all subdirectories. Right? That means that there would be no point in having the first two patterns, as they are redundant. Right?


#6

Basically right, but there is a slight error in step 2:

  1. D takes the first item on the list (let’s say it’s bar/foo) and compares it to the first pattern -> doesn’t match

It would be bar instead of bar/foo, since when you list the root of the repository you’ll only see bar.

Bonus question: if the third pattern had been +foo/* D would have acted differently in step 11. It would simply have backed up all the content in foo/, including all subdirectories. Right? That means that there would be no point in having the first two patterns, as they are redundant. Right?

Exactly.