Filters require each directory explicitly referenced?

In order to backup dir1/dir2/dir3/dir4/ recursively, but not anything else under dir1-3, it appears I have to use a filter like:

+dir1/dir2/dir3/dir4/*
+dir1/dir2/dir3/
+dir1/dir2/
+dir1/

If I’m understanding this correctly, every upper-level dir has to be explicitly mentioned, except if they match a wildcard. (Is there a reason the filter logic doesn’t auto-include the parent dirs? I can’t think of any scenario where you’d want to include a sub-dir but not the parent dir.)

Before I go making my rather painfully large filter file I wanted to ask, am I missing any potentially more manageable way to tackle the scenario where I have a pretty large FS hierarchy and want to backup only specific folders therein? I know of the symlink option but I don’t think it works well here.

Would regex help you better maybe?

regex has the same behavior.

Did you see this topic:

And here is more on #filters.

Ok, so as I suspected I need to explicitly reference every parent dir. Somewhat of a pain but I guess I’ll script around it. Thanks for confirming.

Just a future suggestion then to the devs, maybe just auto-include the dirs. I really can’t think of any situation where that would be undesirable. If I’m including x/y/x/* then obviously I want to include what I just said to include…

@gchen, what do you say about this?

I don’t particular like this idea because there may be unforeseen circumstances when Duplicacy decides to look at a filter pattern and do extra stuff that a user can’t then choose to undo.

This is typically a mode of operation that assumes everything is excluded unless specified as included (presumably with absolute pathnames as opposed to complex regular expressions). Duplicacy isn’t the only tool that does include/exclude filter patterns in this way. IIRC rsync works very similarly - there’s likely a good reason they haven’t auto-included parent directories for this scenario.

For rsync, I would guess (just a guess) that it exists for historical reasons. I can see how their approach could make sense on older hardware and largely static filesystems, as the code can just scan the dirs one depth at a time and thus minimize scanning anything unnecessarily.

I think today the established matching approaches are basically first-match or best-match, and best-match means that the more specific trumps the more general.

Tomato potato, but anyway, a compromise solution could be to introduce another wildcard that auto-includes leading dirs. rsync has added ** and *** to workaround their limitations, this would be similar.

1 Like

Yes, automation of this kind should be optional, e.g. using ++ instead of + or something like that.

Another solution could be that the web-ui helps users to create the correct filter file. I have not yet looked at the web-ui, so maybe this is already how it works, but I’m thinking of an expandable file tree like in the CrashPlan client, where you select check boxes next to every file and folder.

2 Likes

with regular expressions you can achieve the same effect within a single statement - something like ((dir1/)(dir2/)?)?

Still I agree that if there is anything to be done that is mandatory for the current statement to make sense user should not be forced to do that. software must be smart enough to automatically do (hand wave) whatever minimally sufficient things it needs to for the users statement to make sense

2 Likes