Guidance on filters

I am a Computer Engineering major and work with software development, but I can’t seem to wrap my head around the filters. Possibly getting stupider by the day.

I have my Windows desktop that I want to backup. Most of it is in my user directory. So I want to backup Users/Name and its’ contents, but without some heavy stuff.

I want:
Users/Name/ - everything here with some exceptions
Users/Name/AppData/Roaming/ImportantApplication/ - need the files here, but nothing else

I do not want:
Users/Name/AppData/ - except for the path listed above
Users/Name/Dropbox/* - because that is synced

Could anyone help me translate this? Something’s not connecting for me.

My ideal backup result would then be:

Users/Name/AppData/Roaming/ImportantApplication/ImportantFile.txt
Users/Name/ImportantFile.txt
Users/Name/ImportantDirectory/ImportantFile.txt

Thank you in advance!

Here is how to think about it.

  • Each file and directory is being attempted to get matched against filter list, line by line. If the outcome is “include” – it gets included, and if it’s a directory – further traversed. If the outcome is “exclude” – it gets skipped, and if it is a directory – it is not traversed further.
  • Rules that end with / match directories. Rules that don’t – files.
  • If filter file only includes “include” statements – the rest is excluded
  • if the filter file only includes “exclude” statements – the rest is included.
  • If there is a mix – I don’t remember.
  • Rules are relative to the root of where you initialized repository.

The important take away here is that paths are matched line by line, and once match is found, the process stopps. Therefore, the specific line will only be attempted to match if previous lines failed.

I’ll assume you initialized repository in /Users/Name,

One approach is like so:

# 1. exclude the subdirectories of that folder: this matches all 
# subdirectories. Star matches including / symbol 
-AppData/Roaming/ImportantApplication/*/

# 2. Allow to traverse to that folder by explicitly matching the 
# directory paths when traversing
+AppData/
+AppData/Roaming/
+AppData/Roaming/ImportantApplication/

# 3. include files in that folder -- if we passed the 1st condition,
# we are not a directory -- so include:
+AppData/Roaming/ImportantApplication/*

# 4. Disallow anything else in that folder -- every other file 
# and folder not matched so far under 
#that folder will match this rule:
-AppData/*

# 5. Exclude Dropbox: dropbox folder will match this 
#rule and won't be traversed
-Dropbox/

# 6. Include everything else: everything else that wasn't 
# matched so far will be included.
+*

You can cheat a bit, and add `+*/" in the beginning. This will force traversal of all directories, and then you can write rules about files.

In this case, rules might look like so:

## Directory traversal 

# Exclude directories under your thingy
-AppData/Roaming/ImportantApplication/*/

# Allow traversing all directories 
+*/

## Files rules
# Exclude all files under Dropbox
-Dropbox/*

# But include files
+AppData/Roaming/ImportantApplication/*

# Exclude the rest of app data:
-AppData/*

# Include all the rest of files 
+*

This may be easier to understand and write, but will result in a lot of empty directories backed up, which might be misleading.

There is a third option. You can use the -nobackup-file option and place the special file into the folders you want skipped. Then in your filters you include everything, or omit fillters altogether, and your skip marker will live along with your data, which may make it easier to manage.

This is my preferred approach. I don’t use windows, and on macOS there is already similar built-in mechanism used by Time Machine to let app and system developer to mark the data that needs to be skipped. Instead of a file it uses a special extended attribute. But the idea is the same – data about exclusions lives right along with the data it describes, and not in a separate unmaintainable list.