Honoring com_apple_backup_excludeItem on MacOS

On MacOS an extended attribute com.apple.metadata:com_apple_backup_excludeItem is used to tell backup software that the particular file does not need to be backed up. This is a more robust solution compared to separately maintained global exclusion list as the metadata is kept physically close to the the data it describes; that way applications, who have much better idea about how to manage their files, can decide what gets backed up and what is transient and shall be skipped.

Example:

myimac:~ me$ touch testfile
myimac:~ me$ ls -alt@ testfile
-rw-r--r--  1 me  staff  0 Jul 10 12:38 testfile

myimac:~ me$ tmutil addexclusion testfile

myimac:~ me$ ls -alt@ testfile
-rw-r--r--@ 1 me  staff  0 Jul 10 12:38 testfile
	com.apple.metadata:com_apple_backup_excludeItem	61

myimac:~ me$ tmutil removeexclusion testfile

myimac:~ me$ ls -alt@ testfile
-rw-r--r--  1 me  staff  0 Jul 10 12:38 testfile

myimac:~ me$

For more info see man tmutil

Time Machine honors that flag, as does CrashPlan and even Arq.

Duplicacy should too.

11 Likes

Did anything come of this? Having an option to not back up Time Machine excluded files/folders would be extremely helpful for MacOS users.

…Two years and change later… we still need this.

For macOS users this would mean no more redundant work of maintaining separate filter file mirroring the time machine exclusions. The work have already been done by app developers and os developers to identify and tag junk and transient data. Its’ already available.

All duplicacy needs is to take advantage of it. This will result in faster backup, storage savings, and less frustration for the users.

1 Like

Discussion of a relevant PR: Add exclude_by_attribute preference to exclude files based on xattr by plasticrake · Pull Request #498 · gilbertchen/duplicacy · GitHub

I’ll try to get this in to the coming CLI release.

1 Like

Hi Gilbert,

I’m very happy to read on the github issue that this has been merged in. My question is, do we need to do anything for this to take? eg. do we have to add a line to the filters file? Or is it automatic?

Thanks!
Shiraz

ps. I have cross-posted this question to github and the duplicacy forum

You would need to turn exclude by attribute on:

duplicacy set -exclude-by-attribute=true
1 Like

Wow, that was fast! Thank you!

Hi Saspus,

That command is not working for me:

❯ duplicacy set -exclude-by-attribute=true
Incorrect Usage.

NAME:
   duplicacy set - Change the options for the default or specified storage

USAGE:
   duplicacy set [command options]

OPTIONS:
   -encrypt, e[=true]		encrypt the storage with a password
   -no-backup[=true]		backup to this storage is prohibited
   -no-restore[=true]		restore from this storage is prohibited
   -no-save-password[=true]	don't save password or access keys to keychain/keyring
   -nobackup-file <file name> 	Directories containing a file with this name will not be backed up
   -key  			add a key/password whose value is supplied by the -value option
   -value  			the value of the key/password
   -storage <storage name> 	use the specified storage instead of the default one

I installed Duplicacy via the WebGUI download. Do I need to separately update the command line version from Github perhaps?

In the Web Version > Setting > Command Line Version, it does in fact show Current Version as 2.7.2, which is the latest version on Github.

This is what I see:

% duplicacy set help
The set command takes no arguments.

NAME:
   duplicacy set - Change the options for the default or specified storage

USAGE:
   duplicacy set [command options]

OPTIONS:
   -encrypt, e[=true]           encrypt the storage with a password
   -no-backup[=true]            backup to this storage is prohibited
   -no-restore[=true]           restore from this storage is prohibited
   -no-save-password[=true]     don't save password or access keys to keychain/keyring
   -nobackup-file <file name>   Directories containing a file with this name will not be backed up
   -exclude-by-attribute[=true] Exclude files based on file attributes. (macOS only, com_apple_backup_excludeItem)
   -key                         add a key/password whose value is supplied by the -value option
   -value                       the value of the key/password
   -storage <storage name>      use the specified storage instead of the default one
   -filters <file path>         specify the path of the filters file containing include/exclude patterns

(built from source last week. Maybe the change is not in 2.7.2?).

Okay I’m talking to myself at this point, but I didn’t want to bother you with chasing something down that I’ve figured out – I had previously installed Duplicacy via homebrew cask, and the command line version was actually older, even though the web version was saying something else. So I’m trying to get that sorted out.

As far as the actual issue in this post is concerned, I think I’m good.

Thanks again.

Confirmed this works (I had to separately install the command line client).

My initial backup time dropped from 4 days to 17 hours after setting this. To confirm, I unset it and it went back to 4 days.

Though for the life of me I can’t figure out what these gigantic files are that Time Machine are excluding which I hadn’t already excluded.

This change is in 2.7.2:

$ ~/.duplicacy-web/bin/duplicacy_osx_x64_2.7.2 set help
The set command takes no arguments.

NAME:
   duplicacy set - Change the options for the default or specified storage

USAGE:
   duplicacy set [command options]  

OPTIONS:
   -encrypt, e[=true]		encrypt the storage with a password
   -no-backup[=true]		backup to this storage is prohibited
   -no-restore[=true]		restore from this storage is prohibited
   -no-save-password[=true]	don't save password or access keys to keychain/keyring
   -nobackup-file <file name> 	Directories containing a file with this name will not be backed up
   -exclude-by-attribute[=true]	Exclude files based on file attributes. (macOS only, com_apple_backup_excludeItem)
   -key  			add a key/password whose value is supplied by the -value option
   -value  			the value of the key/password
   -storage <storage name> 	use the specified storage instead of the default one
   -filters <file path> 	specify the path of the filters file containing include/exclude patterns

To see everything that will get excluded run this from the root of your Duplicacy repository:

find . -xattrname "com.apple.metadata:com_apple_backup_excludeItem" -exec ls -dp {} \;

Sorry if this should be obvious, but can this be set for the Web UI/scheduled jobs to use? I’m familiar with usage at the command line, but I just started exploring the Web version and not sure how to hook it in. When just running the duplicacy set ... command, where does that get persisted?

Update - as I just looked through the docs again, and after running the set command, from the storage location, it looks like it gets saved in the storage preferences - .duplicacy/preferences - (answering my own question). Still, posting this anyway in case others have the same question.

You would run duplicacy set under ~/.duplicacy-web/repositories/localhost/<n>

Unfortunately this would not work as the web GUI will recreate the preferences file every time it runs a backup job (or other jobs). In the coming web GUI version this option will be enabled by default.

1 Like

Hi folks,

This was a couple months ago but I want to come back to it because I don’t actually think it’s working.

I am able to run:

duplicacy set -exclude-by-attribute=true

without issue.

But when I run:

find . -xattrname "com.apple.metadata:com_apple_backup_excludeItem" -exec ls -dp {} \;

I see not output.

One thing I am confused about is what the actual root of my respository is. I am using the web version of the Duplicacy. In ~/.duplicacy-web/repositories/localhost, I have: 0, all, and restore. I’m not sure which of 0 or all is my root repository. They both seem recently updated. Anyhow, I ran both the above commands in both 0 and all, and in both cases I was able to set the setting but received nothing back from the find command.

What brought me back to this task is that everyday when the backup runs, it runs again for a long time – far more than anything I’ve just changed locally. So I think macos system files are still being backed up.

This won’t matter because

From which directory do you run this?

This would be what you have configured on the web gui. Often it’s users’s home.

You can see what has been backed up in the log files