Filter Patterns - Slashes and Root Items


#1

I’ve been having usability issues with how the filters are implemented and have some feedback.

According to gilbertchen here the filters “borrows … model from rsync” but there are some differences (and I realize gilbertchen said borrows and itsn’t necessarily going to be completely compatible)

Wildcards match slashes

Duplicacy filters ‘*’ matches any character

However in rsync:

  • ‘*’ matches any path component, but it stops at slashes.
  • use ‘**’ to match anything, including slashes.
  • ‘?’ matches any character except a slash (/).

That means you can’t use standard patterns to only match for filenames or to not recurse indefinitely.

For example if I wanted to exclude a certain folder under all users folder this filter would not work the same as in rsync:

-home/*/exclude/

As it would inadvertently also match

home/user/documents/exclude/

The only way around this currently is do use a regexp. It would be great to have * and ? not match slashes and to implement ** for matching.

Hard to match against items in root

Also the lack of a beginning slash in how the filters are processed causes some headaches.

For example if I wanted to exclude a folder in the root, but also any subfolders, I have to define the filter twice:

-exclude/    #match root
-*/exclude/  #match subfolders

If the root files/folders were evaluated with a slash at the beginning then only one filter would be required:

-*/exclude/ # would match both root and subfolders

Wanted to know other peoples thoughts on the above, they would both potentially be breaking changes to how filters currently work, but they are more intuitive to me and would reduce the number of regexps and duplication I have to use in my filters.


#2

What you say makes sense to me. The only counter argument why duplicacy should not operate as you propose is that you should not be trying to move away from using regular expressions but use more of them. I appreciate that duplicacy also supports a simpler syntax as it helps you to get going, but since you’ll sooner or later need regexes anyway, adding slightly more sophistication to the basic syntax is just moving the threshold a tiny bit, so one might wonder: is it worth the effort?

Oh, a second reason just came to mind: implementing the proposed syntax will affect a very large percentage of existing users and the new version will not be backwards compatible, thus causing major upgrade problems…


#3

Hi @Christoph, yes this could be a breaking change (or a configuration option), but it would also be more in line with almost all other “globbing” designs that I’m aware of. From C, bash, python, rsync, DOS, even golang itself an asterisk doesn’t match a separator character. This difference actually made me have to rewrite all my filters that have asterisks to use regexps (which are harder to read) as an asterisk that matches anything is too risky for me to use.

Since this design is ubiquitous, when trying to learn the filters for duplicacy it was not intuitive to me and not understanding the difference could cause the the list of included/excluded files to not be correct.

In all cases the path separator character (/ on Unix or \ on Windows) will never be matched.