Proposal: following symlinks by pattern

This is to address issues brought up by recent posts such as Expanded Symlink Support and Excluded symlinks are still included in backup

My proposal is to add a new field to the preference:

   ...
   "symlink_follow": ["^dir1/link.*", "^dir2/[^/]+$"]
   ...

The value is a list of regex patterns. If a symlink matches any regex pattern in the list, then the symlink will be followed. If the target is a file, the target file will be backed up. If the target is a directory, the target directory will be listed.

If a symlink doesn’t match any pattern in the list, the symlink is backed up as a symlink (if it is not excluded by the filters file).

Another change to be made is, when a symlink is to be followed and the target turns out to be a directory, the source directory (basically the path of the symlink with a trailing / added) is checked against include/exclude patterns in the filters file to determine if the source directory is excluded. Note that it is the source directory, not the target directory, that will be checked.

If symlink_follow is null or an empty list, then a default pattern ^[^/]+$ is used. This pattern means only the symlinks in the root of the repository, not any subdirectories, will be followed, which is the current behavior.

I don’t want to add symlink_follow as an option to the backup command, because that would be error-prone. For instance, you may only run the backup in a script with a carefully selected symlink_follow patterns, but one day you accidentally type duplicacy backup, and your backup will be screwed.

These symlink_follow patterns will be stored in the snapshot files so you’ll be able to go back and check what patterns were used in the past.

3 Likes

I have mixed feelings about this #feature request

Another change to be made is, when a symlink is to be followed and the target turns out to be a directory, the source directory (basically the path of the symlink with a trailing / added) is checked against include/exclude patterns in the filters file to determine if the source directory is excluded. Note that it is the source directory, not the target directory, that will be checked. (emphasis @TheBestPessimist)

Yes, i think this could be the solution for Excluded symlinks are still included in backup and all the related issues.

To me this sounds a lot like the already existing Filters/Include exclude patterns which just need more tweaking. :d: already lists everything and matches agains the filters file, chosing what to include and what to exclude.

By adding the symlink_follow feature, it seems like we are duplicating a lot of what the include patterns are supposed to do.

A different suggestion i’m making is to tweak/empower the include patterns (i:regex or +filepath to follow symlinks nesteed more deeply than repository root, only if the current “to be checked” path (file, folder, symlink to anything ) matches an include pattern.

2 Likes

I thought about that, by adding actions after the patterns. For instance, +dir1/*.lnk: follow means to follow any symlinks with the suffix .lnk under dir1. This would serve as the basis for other extensions, but I didn’t go this route because 1) it is a much bigger change and 2) backward compatibility (with the current behavior of following first-level symlinks) is hard to enforce.

1 Like

What i proposed is simpler than this (ie. than adding a : follow, which implies the user has previous knowledge that this is a symlink file or folder):
If the user types

+dir1/*

then nothing really changes, no symlinks are followed if they are not in repo root (which in this case they arent).
However if there is a symlink

dir1/some/path/to/a/symlink_folder

(since the symlinks currently appear without a / at the end in :d:, i have also written it like that) and the user has the following lines in filters

  • option 1
+dir1/some/path/to/a/symlink_folder/

(here i have written with a / at the end since both in windows and in macos i see folder symlinks as normal folders! there’s no indication this is a symlink unless i test it in cli)

  • option 2
+dir1/*
+dir1/some/path/to/a/symlink_folder/

then the symlink is followed since the folder is explicitly included.

There’s the same option for symlink file as well, we add the following to filters:

+dir1/some/path/to/a/symlink_file

[later edit]: In case the include pattern is written like a file (the just-above) example, a folder should also be included, since

+dir1/some/path/to/a/a_folder

and

+dir1/some/path/to/a/a_folder/

currently behave similar (afaik, please correct me if i’m wrong here, as i have only used regexp filters so far).


Basically the way i imagine this feature is that the user writes the path by what it sees in its file explorer: symlinks to either files or folders appear just as a regular file/folder to the user, so :d: should work things out the same way the user sees them.

2 Likes

This might be over my head a bit so apologies if I’m missing something, but this seems to only let users specify the traversing of specific instances of symlinks which, though theoretically more configurable, seems unnecessarily complicated. Also complicating things is adding another layer of configuration, outside of -options and filters which would create user experience issues around discoverability and a lack of clarity on how all these different points of control will interact (as mentioned by @TheBestPessimist)).

Good UI could solve all of these problems by clearly presenting options and preventing bad decisions, but as long as the CLI is around (and prominent) this seems tricky, at least for my level of :d: expertise. Is there a plan for how this would be implemented in the WebUI?

If you have 100’s of symlinks with no underlying similarities, how would this help? Could we simply use a pattern that specifies the entire backup for symlink_follow, so any symlink :d: encounters will be traversed and treated as a “normal” folder/file? At that point, would all symlinks not follow filter rules, in which case why wouldn’t you just enable all symlinks and leave the job of including and excluding content to the filters which people already use and understand?

Is making it a backup option to flip on or off as needed (eg. -symlink-follow) really that perilous? Is there no other way to enable or disable it per repository so it would travel with the snapshots and future backups and restores would be aware of it (to avoid catastrophe from user error)?

I obviously don’t understand enough about the innerworkings to make helpful suggestions, but I know that for this to be successful it needs to be implemented in a way that is clear to users as to how to control it and at what level it is acting, so with even just a basic understanding of :d: a user can find and use the feature successfully, including when combined with other features.

Thanks @gchen for spending the time to look into this and create a solution. Hopefully I’m lending some perspective from a more average user on how to implement this.

Why can’t -symlink-follow be false (default) or true?
In case of default, for backward compatibility, use current rules, if true - follow symlinks/mount points but check them against include/exclude patterns as appropriate…

Related: Filters and patterns - include specific folders

Did anything end up happening with this? Have any changes / updates regarding symlinks, hardlinks, junctions, etc. happened since this was first proposed?

I’m starting to get desperate and would take any solution that allows a user to tell :d: to directly access the actual folders and files symlinks are pointing to, beyond just the root of the repository.

As far as i remember, nothing was done on this aspect.

:sob:

Guess I’ll continue to use additional software (Hardlink Backup) to unravel my crazy hard/symlinked data structures and create a cleaned up duplicate of my data just for Duplicacy.

I’m an enabler!

Not many users showed interest in this feature after I pushed the PR, so I’ll leave it there for now.