File exclusion based on file attributes, folder exclusion based on contained file name?

I excluding files and folders is quite a cumbersome task in duplicacy, both in GUI and CLI. So I would like to suggest the following additional ways of excluding files and folders:

  1. exclusion of files based on file attributes (esp. the archive bit could be used for that purpose, since it is rarely used by any software any more). In other words, if I want to exclude one or more files while I’m looking at them in the windows explorer I just set their archive bit and I’m done.

  2. exclude folders that contain a file matching a specific name. For example, I could tell duplicacy if a folder contains a file named dontbackupthisfolder, then don’t backup this folder. So, again, I could easily exclude folders while browsing them by adding a file with that name.

1 Like

I think 1 is ok – we can add an option to the backup command which will skip files with the archive bit set (or not set?).

As for 2, can’t you also set the archive bit of the folder to exclude it from backup?

I think 1 is ok – we can add an option to the backup command which will skip files with the archive bit set (or not set?).

Well, that’s nice to hear, but I now realize that this smart idea of mine doesn’t seem to work because the way the archive bit works is that it is set (by windows) whenever the file is written. It doesn’t even need to be modified. Moving it is apparently enough to trigger the archive bit. You can manually clear it but as soon as the file is written again, it will be re-set. So there might be some very specific use cases for using this as an exclude trigger, but it’s not what I had in mind.

As for 2, can’t you also set the archive bit of the folder to exclude it from backup?

I wasn’t even aware that folders also have archive bits. What is strange, though, is that in contrast to how it works with files, the archive bit of folders is cleared by default. And, also in contrast to files, moving the folder or even adding files to it, doesn’t re-set the bit. The only way I can make sense of this is that the original idea of setting the archive bit for a folder is to say “backup this folder even when it is empty”. But regardless of its intention, this actually makes it usable for the purpose of exclusion. The only problem is that its use would be counter intuitive because you would have to check “Folder is ready for archiving” to indicate the exact opposite (I’m ignoring the fact that backups are not really archives, obviously):

screenshot

So, while I would personally be fine with this, I’m not sure it’s the most elegant solution.

IMHO, the “exclude if folder contains file named xyz” is much more elegant and fits well in the overall approach of duplicacy of storing all information locally and transparently. Besides, this solution works on all systems, not just windows.

And, finally, it is also more generic, i.e. it can be used not only for manually excluding a specific folder while browsing it, it can also be used for automatic exclusion in a scenario where, for example some program produces files with particular names and (for whatever reason) you don’t want folders with those files to be backed up. (Admittedly, I can’t think of a concrete use case, but I believe this kind of flexibility is always a good thing.)

Why are you hesitant about suggestion 2 above? Is it difficult to implement? Or will it slow down duplicacy if it has to get a complete list of the folder contents in order to determine whether to back it up or not? If the latter is the case, I guess the resource use could be significantly reduced by not allowing regular expressions, only fixed file names.

If the archive bit can be used to exclude files and folders, then suggestion 2 isn’t necessary. But if archive bit can’t be used, then suggestion 2 deserves some serious consideration. My only concern is that having two ways to exclude folders may cause some confusion if a user checks one way but doesn’t realize (or forgets) there is another way. Moreover, if we decide to implement this feature, I would like to see a different file name, such as .duplicacy_no_backup, so that it is clear that this file is used by Duplicacy only.

My only concern is that having two ways to exclude folders may cause some confusion if a user checks one way but doesn’t realize (or forgets) there is another way.

Very good point! I would be a candidate for precisely that kind of mistake (if I hadn’t come up with the idea). A way to avoid that is that you activate the “exclude folders containing x” via the filters file, i.e. you introduce a third type of line in that file. So in addition to +/- and e: there could be something like | .duplicacy_no_backup which would mean exclude folders containing a file called .duplicacy_no_backup.

I’m introducing a new symbol for starting a line in order to keep this type of exclude clearly distinct from the the others, but maybe that is not necessary.

I would like to see a different file name, such as .duplicacy_no_backup, so that it is clear that this file is used by Duplicacy only.

You mean so that users don’t forget why this file is there? I see the point but I’m not sure it is necessary to technically enforce that by limiting the file names that duplicacy accepts. I’d say it’s sufficient to suggest to users in the wiki that they should choose the filename they use wisely (and/or to write a note into that file explaining what it is). But I think it would be good to not set a fixed file name so as to allow, for example, that different backup jobs use different ones.

Can you submit a github issue to continue the discussion?

Here it is: https://github.com/gilbertchen/duplicacy/issues/337

So, are you planning to implement this?

On macOS Time Machine already uses extended attribute com.apple.metadata:com_apple_backup_excludeItem to exclude stuff. I think duplicacy should honor those. CrashPlan does.

1 Like

That is a different question which has been discussed here for Windows. This topic is about a new way of manually excluding individual folders.

Hmmm. I disagree.

The topics is about excluding files and folders by attributes, and I’m pointing out that this is the way it’s already done in macOS that is supported by multiple backup tools, including Time Machine, Crashplan and Arq. I use this approach myself to mark transient data I generate to be skipped by those tools.

The other topic you linked talks about the global exclusion list approach — which is not sustainable. It separates files and metadata. Time machine had one too and it’s abandoned since forever in favor for extended attributes.

Edit. Ah, I get it. This topic discussed two approaches and resulted in feature request for the second one (file based).

Well need to create a feature request for honoring no-backup attribute attribute on macOS separately.

1 Like

Implemented and merged:

1 Like

Oh wow, this is a fantastic little change that has so far gone completely unnoticed (because the github issue is still open). So I can now add a .nobackup file to any folder and that folder (including subfolder?) will not be backed up, regardless of what is in the filters file?

Yes, sorry, I had also not quite grasped your proposal. But yes, it’s probably better to create a separate topic, just to keep things focused as much as possible.