Question about include/exclude filters in Duplicacy Web

Hello guys

After trying to understand the include/exclude filter system of Duplicacy Web, I wanted to ask whether I understood it correctly or not:

I added the root folder “homes” as backup source, with the following contents (example):

  • Documents
  • Docker
  • Work
  • School

If i want to include only “School” and all it’s contents, I simply add this filter, right? +School/*
Duplicacy automatically excludes the rest, I assume.

Now, if I want to include only the Subfolder “Bachelor” of “School”, and exclude everything else, I use only this filter: +School/Bachelor/*
Correct?

Finally the bonus question, probably the most difficult for me to understand:
To include only School/Bachelor, School/MBA and Work and exclude all else, I could add a filter like:

+School/Bachelor/
+School/Bachelor/*
+School/MBA/
+School/MBA/*
+Work/
+Work/*
-*

Or would I need to add here the respective parent directory as well, like:

+School/
+School/Bachelor/
+School/Bachelor/*
+School/MBA/
+School/MBA/*
+Work/
+Work/*
-*

Any help for clarification would be highly appreciated - thanks :slight_smile:

For me it was very helpful to think about how duplciacy is handling the filters file when writing the rules:

Duplicacy traverses the filesystem top down, and for each directory and file it encounters it goes through rules one by one from the top until it finds a matching one; it then ignores the rest of the rules, and moves on to the next file or directory. (If it does not find a match – a default rule will be used, which is “exclude” if all the other rules are “include”)

So for you case,

you want it to look into School/ folder, but only backup two specific subfolders.

# This matches only directory itself 
+School/

# These match files under the directory. 
+School/Bachelor/*

# This matches Work
+Work/*

# Because all rules are "include" the default is "exclude", so the next 
# line is not needed, but you can keep it for clarity: 
-*

Quiz. What if you want to include all files under /hi/there/hello/ but exclude everything else – including /hi/there/howareyou?

Answer:

You would need to make sure duplicacy traverses to that last one by including directories only:

+hi/
+hi/there/
+hi/there/hello/*

Note, * matches everything, including directory separator, and nothing, so doing this:

+hi/there/hello/
+hi/there/hello/* 

Would be redundant: Stuff that matches the first line is a subset of stuff that matches the second.

Now, if you have a lot of directores listing them gets very annoying very fast. You can use something like this to visit all directories

+*/
+hi/there/hello/*

The downside – the backup will probably contain massive number of empty directories…

And last bit of advice – use backup with -dry-run flag to check your rules: it will go though all the motions without actually uploading anything, so you can verify that the right files are being picked up and/or skipped.

Ok, I’ll go on with unprompted advice.

  1. You can include other files in your filters file. For example, you can have

    @/Users/Wolfgang/Documents/duplicacy-rules.txt
    

    in the filter file and keep actual rules in that duplicacy-rules file in your documents folder, so you can can backup and replicate it.

  2. You can absolve yourself from writing rules by using metadata; on macOS duplicacy honors Time Machine exclusion attributes – everything that is excluded by Time Machine will be skipped by duplicacy when that feature is enabled. Or you can put a special marker files into the folders you want skipped. See -no-backup-file here set · gilbertchen/duplicacy Wiki · GitHub

Edit 2: missed the question

It now shall be more or less evident, that this can be further simplified to

+School/
+School/Bachelor/*
+School/MBA/*
+Work/*

Edit3: I edited your post to add formatting to configuration files and prevent the forum from misinterpreting things. See more here: Formatting posts using markdown, BBCode, and HTML - Using Discourse - Discourse Meta

1 Like

Hello and thank you very much for your extensive reply!

Definitely makes more sense now.

+School/
+School/Bachelor/*
+School/MBA/*
+Work/*

Couldn’t this be even further simplified to read instead:

+School/Bachelor/*
+School/MBA/*
+Work/*

Thanks again :slight_smile:

No problem!

No, and that’s a key point:

Because duplicacy traverses the filesystem from the top down, it will first encounter School/ directory itself and will attempt to find a match for it in the file. It won’t find any – because none of the other lines matches the School/ directory itself (note, if the rule ends with / it matches directories, else - files), so it will assume it’s excluded. Since it’s excluded – it won’t bother checking anything inside it, so all your rules about stuff under School/ will not have a chance to work.

But do play and experiment with it with -dry-run :slight_smile:

1 Like

In that case, if there are many intermediate directories in the middle, would we have to list them all as well? or maybe in that case, we’d better put some symlinks in the root?

+School/
+School/many/
+School/many/other/
+School/many/other/intermediate/
+School/many/other/intermediate/directories/
+School/many/other/intermediate/directories/Bachelor/*
+School/many/other/intermediate/directories/MBA/*

Yes. Which I agree duplicacy should have figured out on its own — if user specified +/hi/there/hello/my/crazy/data/*, then it’s duplicacy’s job to figure out how to make it happen and it should not require user to spoon feed and handhold it through the process. I’ve complained about it over 5 years ago: Warn about [and perhaps, autocorrect] unreachable filters statements

That’s one approach. Another approach is my comment above:

Third (my preferred approach) is to have default “include all” and instead, selective exclude data that you are absolutely sure don’t need. It’s always better to err on the side of backing up more data than necessary, not less. Storage is cheap, data is expensive.

1 Like