Do not filter out .duplicacy/ and change transient data locations

filters
backup

#1

There is a hardcoded behavior to skip backing up contents of the .duplicacy directory in

  func ListEntries(top string, path string, fileList *[]*Entry, patterns []string, nobackupFile string, discardAttributes bool) (directoryList []*Entry,
            skippedFiles []string, err error) {
           ...
           for _, f := range files {
                    if f.Name() == DUPLICACY_DIRECTORY {
                            continue
                    }
           ...

I sort of understand why it is done, but I think it is ill-advised and I disagree. So, step by step:

Why this directory shall be backed up?

Easy. It contains user created content: filters, preferences, schedule, scripts, even private keys for destination storage. No user created content shall ever be lost. Now I have to do stupid stuff like keep those files elsewhere and symlink them to under .duplicacy to have them backed up.

Why I think it is actively skipped now?

I think it’s because it also contains cache and logs.

So, would it be great if we only skipped .duplciacy/logs and .duplicacy/cache from backup instead then ?

Not really. This would be better, but not ideal. The ideal behavior would be as follows:

  1. Place cache files where they belong, under ~/Library/Caches. Similar thing on Windows.
  2. Place log files where they belong, under ~/Library/Logs
  3. Set excludeItem extended attribute on caches and honor it :slight_smile:
  4. Do not set that attribute on Logs: Let user decide if they want to backup logs. I would.
  5. Remove that explicit check for duplicacy directory from the source.

#2

Wsit, are you saying that my filters and preferences files are not being backed up even though I explicitly included them? :hushed:

Though the disadvantage with putting large cache folders there (I suppose under Windows it would be under AppData or ProgramData) is that it quickly fills up your C drive (which is often a small SSD)…


#3

I don’t thinks this disadvantage. Those are locations designed for and specifically designated to hold caches.

If the user has way too small system drive he/she is free to buy bigger ssd, or redirect those folders (in registry) to alternative location, or even mount NFS volume there; whatever they do is (and should be) all outside of and not limited to duplicacy – this is applicable to any application that creates temporary caches – and thus duplicacy shall not deviate from expected behavior.

Doing so makes it worse: for well-behaving apps there is only one folder to redirect. but for things like duplicacy user has to first hunt down the non-standard cache location and then redirect each of those individually. And then what if the user actually did all of that and now duplicacy is the only one that does not comply? Or what if the user uses another backup solution in parallel? Now it has to manually exclude duplicacy’s cache from there. And guess what - if that another solution is also duplicacy in a parent folder - well, I had to do precisely that :). Rather annoying.

What I’m trying to say duplicacy shall behave in a standard and predictable way and be a good OS citizen and store data where intended.


#4

Hi,

I have made some changes to duplicacy in my fork to change this behavior. I will make a pull request soon :slight_smile:

With the new preference “nobackup_file”, it is possible to specify a trigger file to exclude a directory.
If this preference is used, duplicacy will not exclude the .duplicacy folder any more.

You can use the following filters to exclude internal data like cache and logs:

# ============================ exclude the internal files and directories in all ".duplicacy" subfolders
e:(?i)(^|/)\.duplicacy/cache/
e:(?i)(^|/)\.duplicacy/temporary$
# ============================ exclude the internal files and directories in toplevel ".duplicacy" subfolder
e:(?i)^\.duplicacy/incomplete$
e:(?i)^\.duplicacy/logs/

Make it possible to include a file in the filters file
#5

I like where you’re heading with this, but I’m skeptical regarding this:

Isn’t that a rather complicated and potentially confusing logic? Not that it doesn’t make sense but I think there is too much serious stuff going on with a high risk that the user is not aware of it. For example: an innocent user wanting to use the no_backup file feature and turns it on. At the next backup a large amount of cache data is backed up, possibly still without the user noticing.

I think these things (a switch to backup .duplicacy and the no_backupfile preference) should be kept separate (because there is no inherent connection between them). As a minimum, I think the cache folder should always be excluded, no matter what.


#6

Regarding these issues, I think a fixed set of exclude patterns (that are added by duplicacy itself), instead of excluding the whole folder would make sense. I have adjusted my solution.