Multiples Drive to Same Backup ID?

Wesley · 9 February 2020 20:34

Here’s how I have things set up at the moment. I have two hard drives in my computer designated for my data. The first is my basic data drive, which contains all files except video. The second is my video drive, reserved for my ever-increasing video footage.

When I first set up Duplicacy, I installed a third, large drive as my local backup. Using the web GUI, I set up two backups, one for the data drive (D:) with the backup ID of COMPUTERID-Data and one for the video drive (E:) with the backup ID of COMPUTERID-Video. These are both stored on the backup drive (Y:). (This is then copied to Backblaze, although I don’t think that’s really relevant to my question.)

My video drive is nearing full capacity, so I’ve installed another for further video files (F:). Of course, I want to add this to my backup, but I’m unsure if I should create a new backup ID for F: or if I should add it to the existing COMPUTERID-Video. (Click the +, select F: for the directory, click the green icon next to “Backup ID,” and select an existing backup ID.)

Considerations:

I may be moving files between drives E: and F:.
I will eventually add additional drives for video. Alternatively, I may move all files from E: and F: to an expandable RAID or similar storage solution.

It seems like using one backup ID for all video is the right solution, since it may all be consolidated to one storage device eventually, anyway, but I confess I don’t totally understand how Duplicacy handles such things. Naturally, I want to avoid files being needlessly backed up twice when I move them. Advice is appreciated.

Droolio · 10 February 2020 01:40

You should pretty much always use a unique backup ID for each repository - especially if they’re going to the same backup storage…

If you don’t, Duplicacy will see the contents as new each time and won’t be able to do proper incremental backups. You won’t lose de-duplication, but it will take a very long time to run the backup as it scans each file afresh - just like a backup run with -hash. Then, when backup B is complete and backup A (which has the same ID as B) runs, it’ll do the same rescan of your entire drive. You’ll have even-numbered revisions with metadata from one repository, and odd-numbered from another repo. That’s not good.

Incremental backups work by comparing the metadata - timestamp, pathnames, sizes etc. - of your last backup snapshot for that ID to the current repository. It’ll only backup files that have changed or been moved (and it’ll de-duplicate most of the moved chunks anyway), or been added. This saves time as only new or modified files are hashed, chunked, and committed to backup.

Use an ID like COMPUTERID-Video2 or COMPUTERID-Video_F or whatever. You’ll still benefit from de-duplication if you need to move files between the two locations.

Wesley · 12 February 2020 03:47

Ah, thanks very much! Unique IDs it’ll be.

Also, is there any way to change an existing backup ID? Say, if I wanted to change COMPUTERID-Video to COMPUTERID-Video_E for consistency?

Droolio · 13 February 2020 04:35

Honestly, the easiest way would be to just remove the backup ID and re-add it with the new ID.

Your first backup with the new ID will take longer - similar to a backup with the -hash option - but it’ll still de-duplicate most of the chunks and won’t have to re-upload everything.

Compeek · 11 January 2021 21:40

Forgive me for reviving a somewhat old thread, but I want to share a solution to this problem that’s been working well for me.

I created a folder called Root that contains a symlink (technically a junction) to the top level of each drive. For me, it looks like:

Root
\- C
\- D

I have the Duplicacy backup configured to use the Root folder as the backup directory, and then I use the include/exclude filters to backup only the things I actually want to back up. This works because Duplicacy follows symlinks at the top level of the backup directory.

It’s a little bit of a hack, and it took some tinkering to get the filters right, but now I have a single backup that contains everything on my computer that I want to back up. If I move things between drives, it’s no different than moving it between any other folders. Most importantly to me, it keeps things simple and leaves me less room for error in configuring things as long as I’m careful with the filters.

IMPORTANT: Wherever you end up putting it (I just put it in my user folder), make sure you exclude the Root folder itself from the backup, or otherwise it might try to do some sort of recursive thing.

klayhamn · 31 August 2022 19:58

@Compeek thanks! that helped, not sure why Duplicacy doesn’t support this out of the box?
why treat different drives any different than different folders?

this is especially useful to me because I’d like to separate my backups based on some personally-defined categories that span multiple drives, so I’d like each category (which possibly contains files and folders from 3 different drives) to be bunched up together in the same logical backup/repository —

am i missing something?
seems like an arbitrary enforcement of logical separation just because of a physical one

====

btw, i just tested it and i’m getting this error:

ERROR SNAPSHOT_EMPTY No files under the repository to be backed up
No files under the repository to be backed up

why? i created the symlinks as junctions, should i have done something else…? when i try to follow them with explorer it works just fine

Compeek · 1 September 2022 04:48

If I remember, Duplicacy only follows symlinks at the top of the backup directory, so make sure you set the backup directory as that Root folder itself (if you followed my example) and then immediately in that folder have the symlinks. I currently have mine at C:\Root and so then I have C:\Root\C and C:\Root\D as my symlinks.

Assuming you did that, then I would guess maybe you don’t have the filters set up right? It’s a little confusing because you have to make sure each level of the path(s) you want to backup are matched. Definitely read through this if you haven’t: Filters/Include exclude patterns - How-to - Duplicacy Forum

It’s been a while since I set this all up, but here’s an example of my filters:

-*/node_modules/*
-*/venv/*

-C/Users/Compeek/AppData/*
-C/Users/Compeek/Downloads/*

+C/Users/Compeek/*
+C/Users/
+C/
+C

+D/Compeek/*
+D/
+D

-*

The first two globally exclude any folder named node_modules or venv at any level. Then I exclude a couple folders in my user directory (AppData and Downloads). Next I include my two main user folders on the two drives, including each level of the path separately. Finally I exclude anything that hasn’t matched a filter so far.

There’s probably more than one way to accomplish the same result, but those filters have been working well for me.

klayhamn · 1 September 2022 08:42

Exactly what i did

That sounds very strange…
but i will definitely try it out tonight

UPDATE: it worked, thanks

Compeek · 1 September 2022 14:28

Sure thing! I’m glad you got it working.

gadget · 1 September 2022 20:36

If you had to create a filters file with includes, then there likely was a link loop.

In Windows there are actually three types of links: hard links, soft links (aka., “junctions”) and symbolic links. They behave differently and also have different constraints.

klayhamn · 1 September 2022 20:40

I didn’t - i work with the Web UI
I had to follow the tip of the comrad above me - of adding each of the path components (of each of the paths) to the includes

To me this seems a bit redundant, seems to me like there must be a simpler way to make a backup scheme that spans multiple drives than this non-straightforward adventure

There also most definitely is no loop, since the “root” folder is on C, but the only thing i include from C is “my documents” basically which is in a totally different place.

How is the difference between them relevant to Duplicacy? I found a post in this forum suggesting to use a symlink and this thread suggested to use a junction… so which is it?

gadget · 1 September 2022 22:00

That’s fine. Under the hood it works the same way. The Web UI saves the filter rules to a file so that the CLI can use it during the backups.

This is the part that makes me think there’s a link loop. I created a folder layout just as you and @Compeek described and couldn’t replicate the problem you two had. Duplicacy had no issues backing up without any filter rules involved.

The part I’m unsure of is if your concept of “backup scheme that spans multiple drives” is the same as mine.

Do you mean you want to back up multiple unrelated drives using only a single backup configuration to a single shared target destination?

If you don’t mind sharing, what was the exact command used when creating the soft links?

And was your final repository layout something like this?..

C:\Root
C:\Root\C
C:\Root\D

From Duplicacy’s (CLI and Web Edition) point of view, all three types are just links (more on this in a moment). Duplicacy follows – “dereferences” – links only in the root of a repository. All subsequent links found below the root are backed up as-is – i.e., during a restore they are recreated as links.

So, which link type to use? – It all depends on the use case and environment. Under the hood each link type is treated differently by Windows. There are also restrictions depending on the version of Windows, but for the sake of our sanity we’ll assume the minimum is Windows 7:

If you have Administrator privileges on the computer all three can be used (by default, only Administrators can create symbolic links).
Hard links are analogous to a piece of pipe. Just as a pipe cannot directly link to an object that’s separated by a solid wall, a hard link cannot span drives (e.g., link C:\link.txt to D:\file.txt), can only be used for files, and a target file must already exist.
Soft links, aka. “junctions”, are almost the opposite of hard links. They can span drives (like a piece of string winding around a wall), link only to folders, and a target folder doesn’t have to exist.
Symbolic links were a newer addition in NTFS. They work pretty much like they do on Unix/Linux, linking to files and/or folders that span drives, but one big difference is that Windows allows the target to be a UNC – e.g., MKLINK /D C:\Music \\192.168.1.2\Music.

Whether it’s soft links or symbolic links, if one or more drives are external (like the scenario that the OP asked about), I wouldn’t recommend using drive letters for the links if there are going to be multiple external drives.

klayhamn · 1 September 2022 22:24

I can give you my exact scenario without revealing any personal data. Assuming my drives are “X”, “Y” and “Z” i have:

X:\cool
Y:\awesome1
Y:\awesome2
Y:\awesome3
Z:\go\deep\into\subfolder1
Z:\go\deep\into\subfolder2
Z:\go\deep\into\subfolder3

As the raw locations that i wish to backup under a singular backup plan

that’s it
the “root” folder itself is in:

Z:\root

Which is exactly why there’s nothing circular here

By the way, the official documentation of Duplicacy explicitly states you must include all the components of the path as well

In fact, let me quote it:

Patterns ending with “*” and “?”, however, apply to both directories and files. When a directory is excluded, all files and subdirectories under it will also be excluded. Therefore, to include a subdirectory, all parent directories must be explicitly included.

You can find this doc in the link shared above by our other comrad

My repository itself was based in root (at least that’s how it appears from the Web UI) - so that all the include paths were relative to root, and not phrased in absolute terms.

So the include paths were actually (i had to add spaces for this to render correctly):

X\Cool \ *

For example, rather then

X:\Cool \ *

Sure, i went back and forth between symlink and junction trying to make this work - i just went to check and at the moment it seems like it’s a symlink

So the command i used is:

mklink /D Z:\root\X X:
mklink /D Z:\root\Y Y:
mklink /D Z:\root\Z Z:\

None of the drives are external, these are 3 internal HDD’s - one of them the main one on which the OS itself is installed (and also happens to host the “root” of the repository)

Yes.
What does it matter if my data is spread across different folders or different drives?
For the software these are all just locations to access and pour into the backup, right?

Why not imagine that my entire computer is one big “virtual” root and that my different “drives” are actually folders who happen to have one letter names?

Why is it important to give special status to the physical drives when choosing sources for backup?

The reason i want my backups to span multiple drives is that I organize them by importance (and by how many copies of the data in how many destinations i want them to have) — not by their original location, which to me is completely arbitrary

I ran out of space on “Y” so i bought another HDD and named it “Z” -
but they both might contain videos that I want to backup under the same plan

I don’t want to have more backup plans just because i have multiple drives

gadget · 2 September 2022 02:56

Got it. My setup is similar with multiple drives under a single Duplicacy repository and backup configuration. The storage destination is also shared by multiple hosts (around a dozen bare metal and virtual machines).

Agreed, so far so good…

Yes, the quote above is true, but the rule only applies when a filter explicitly excludes a directory.

@Compeek 's filter file posted earlier ends with -*, which effectively blanket excludes everything under the repository root, so the only way to counteract it is to explicitly include files and/or subdirectories that are to be added to the backups.

Here’s an example… let’s assume we’re going to back up the C: drive and it’s the root of a Duplicacy repository:

C:\

Within C:\ there’s just a single folder and two files:

C:\Music\song.mp3
C:\playlist.m3u

Without a backup filter, Duplicacy will grab both C:\Music\song.mp3 and C:\playlist.m3u.

But now let’s assume that we add a backup filter with just a single rule:

-*

(The pattern above translates to “exclude everything”.)

If we run a backup, Duplicacy will walk the filesystem matching what it finds against our filter. Because our filter matches everything, and we’re telling Duplicacy to exclude all matches, there will be nothing to back up.

So in order to capture both files, we need to explicitly countermand the blanket exclude:

+playlist.m3u
-*

In the revised filter above, C:\Music\song.mp3 will still be ignored because it still matches the blanket -* pattern so we also have to explicitly include it (note that path separators in the filter patterns are always forward slashes, i.e. ‘/’, regardless of the host operating system):

+Music/song.mp3
+playlist.m3u
-*

However, song.mp3 is again still ignored because the blanket -* pattern prevents Duplicacy from descending into C:\Music. What we also need is to tell Duplicacy it’s okay to search the C:\Music folder:

+Music/
+Music/song.mp3
+playlist.m3u
-*

Of course, the example above is an extreme case because the backup results with and without the filter file are identical, making the filter file unnecessary.

Yep, same starting point for both the CLI and web UI.

Yes, it’s because the filter is referencing the syminks rather than the abstract drive letters Windows assigns and all the paths are relative to the repository root (Z:\root).

(Windows tip: The command DIR /A:L Z:\ lists the junctions and symbolic links contained in the specified folder, including each link type found.)

hmmm… so… Z:\root contains a symlink named ‘Z’ (Z:\root\Z) that points to Z:\, and within Z:\ is a subdirectory named ‘root’ (Z:\root) which contains a symlink named ‘Z’ (Z:\root\Z) that points to Z:\, and within Z:\ is a subdirectory named ‘root’ (Z:\root) which contains a symlink named ‘Z’ (Z:\root\Z) that points to Z:\, and within Z:\ is a subdirectory named ‘root’ (Z:\root) which contains a symlink named ‘Z’… [channeling William Shakespeare]… methinks thou doth have a link loop.

Try the following little experiment…

Find a spare USB flash drive that can be reformatted.
Reformat it with a NTFS filesystem (I’ll assume the drive letter is F:, but adjust accordingly).
Add a folder named “root” (F:\root\).
Add a link that points to F:\ using the following command: mklink /J F:\root\F F:\
You now have a link F:\root\F just like in your example above (Z:\root\Z).
Finally, cd F:\root\ and issue the tree command.

How many levels deep does the tree output go?

That’s good. It’s less risky than shuffling external drives but still subject to quirks in Windows. As you already know, drive letters in Windows are generally assigned in alphabetical order starting with C (for historical reasons). A drive letter can be assigned to a volume, but it’s not permanent. If a drive is temporarily disconnected and a new one is added the new drive will claim the freed up drive letter.

So the longer term solution is to refer to the GUID of a volume (similar to what Linux systems do). There’s more than one way to find out what the GUID is for a formatted volume, but the quickest the command fsutuil volume list which lists the GUIDs for all attached volumes (including any USB drives):

C:\>fsutil volume list
Possible volumes and current mount points are:

\\?\Volume{6651cade-a3f8-11ec-876e-d89ef3105a25}\
C:\

In the example above, 6651cade-a3f8-11ec-876e-d89ef3105a25 is the GUID for the formatted volume, and the UNC \\?\Volume{6651cade-a3f8-11ec-876e-d89ef3105a25}\ refers to the formatted volume that’s currently assigned the drive letter C:.

To create a junction/soft link…

MKLINK  /J  Z:\root\C  \\?\Volume{6651cade-a3f8-11ec-876e-d89ef3105a25}\

… or a symbolic link:

MKLINK  /D  Z:\root\C  \\?\Volume{6651cade-a3f8-11ec-876e-d89ef3105a25}\

The major advantage is that the GUID is drive letter agnostic. It remains the same for the life of the volume until it’s reformatted.

Let’s assume we’ve got a junction/symlink currently pointing to a Kingston USB flash drive and Windows assigns it the drive letter K:. If we eject it → plug in a mix of other different USB drives (and one of them claims the drive letter K: that the Kingston flash drive was using seconds ago) → then reinsert the Kingston flash drive and it’s now assigned the drive letter U:, the junction/symlink will still point to the Kingston USB flash drive even if we daisy chain five new USB hubs before plugging the flash drive into a random port on the last hub.

It also means things aren’t limited by the 26 letters of the English alphabet (nowadays with HDD, SSD, NVMe, USB, drive partitions, etc. it’s quite possible to exceed 26 volumes). Your junctions/symlinks can even be more descriptive and could include external drives that aren’t always plugged in:

MKLINK  /J  Z:\root\Kingston_USB_flash_drive  \\?\Volume{6651cade-a3f8-11ec-876e-d89ef3105a25}\

Got it. My setup isn’t all that different from yours, including what you described in your updated post.

While it’d be a pain to re-rip and re-curate hundreds of music CDs I’ve got safely packed away in totes, they’re part of my “3-2-1 backup” (lossless compression in FLAC format followed by MP3/OGG files depending on the playback device).

It doesn’t, and Duplicacy doesn’t care either. It’s just a matter of pointing Duplicacy in the right direction.

Yep. Mix and match to heart’s content.

That “virtual” root can also contain links/maps to remote storage. As long as Duplicacy sees a filesystem it can access, the sky’s the limit.

One of my servers, for legacy reasons, is running RHEL 4 (circa February 2005). The hardware is almost 13 years old and the 32-bit OS is too ancient to run even Duplicacy (as portable as Go apps typically are, they still rely on the OS).

So in a VM running CentOS 8, I’ve got a directory (/mnt/RHEL4) mounted via SFTP (sshfs module for FUSE) to / on the RHEL 4 server which Duplicacy happily crawls without nary a complaint.

Duplicacy doesn’t give special status to physical drives – any accessible path to a filesystem is perfectly fine with Duplicacy.

One of my Linux machines has 5 SATA drives + 1 eSATA drive dock + 2 USB HDDs + an evolving collection of high-capacity USB flash drives. I’ve got a set of small weather-resistant plastic ammo cases used for storing bare HDDs and SSDs that are occasionally plugged into the eSATA dock. Everything is mapped to /media and then selected directories are bind mounted read-only to /srv/Duplicacy.

If needed, in a pinch I could easily add a directory from a remote server that accepts nothing but FTP connections and is only accessible over a VPN passing thru a SpaceX Starlink satellite before it relays through a smartphone used as a Wi-Fi hotspot by a laptop in a tent in the middle of Yosemite National Park.

It makes zero difference to Duplicacy and is all under a single backup plan. I also use Duplicacy’s tagging feature to label snapshot revisions so I can easily identify backups that were done by automated backup, triggered manually, special revisions that are to be preserved, etc.

So the issue isn’t Duplicacy, it’s with Microsoft Windows relying on drive letter assignments.

klayhamn · 2 September 2022 10:39

according to the docs, if the pattern has only include statements, it would effectively exclude everything else, which is my working hypothesis as to why it behaves as if i had the same rule as our comrad

right, but why do we have to engineer all these symlinks etc. ? why not just have the software allow us to pick the drives we want?
When i select folders to sync in “Google Drive” i don’t go around creating “fake folders” or symlinks — even if those folders happen to span multiple drives

…but it does.

I cannot select folders from multiple drives without this symlink trick.
yet i can select any folders i want from the same drive (without symlinks).

Hence the different treatment.

Sure but any other kind of software that functions inside windows and allows to pick folders across multiple drives, doesn’t have this problem.
That would be like going to Britain, driving on the wrong side of the road, and then saying that the British history of transportation is the problem.

Well, maybe — but if you want to drive in Britain, you better get adapted to how it works.

Similarly, if you choose to work in Windows, you need to take into account that it might work differently than linux.

As i said, no one could suspect that Google is Microsoft’s “ally” in any way, yet its “Google Drive” program effortlessly copes with folders across multiple drives, whether they have “letters” or not - in Windows.

That is of course not a loop in practice, because I don’t actually try to backup Z:\ or Z:\root themselves.

It’s true that Z:\root itself contains a loop, since it points to the same drive the “root” is contained in,
but as long as i’m not trying to back up Z:\ itself or Z:\root itself — what is the problem?
there’s no loop in my declared backup source –
I tell it to go directly to:

Z:\root\Z\potato

and back it up to some destination.
at no point should it ever try to go and “discover” or “run into” Z:\root itself on its way to “Z:\root\Z\potato”

i.e. - the path:

Z:\root\Z\root

For example, should never come up - since i don’t try to back up root itself
I am telling it where to go inside “Z:\root\Z” - and that is: “Z:\root\Z\potato” - a path that contains no loops.

Anyway, if the loop was the problem, why would adding the intermediate paths solve the problem?

gchen · 2 September 2022 12:06

Add -d as a global option to the backup job. Then you’ll see in the log why the directories pointed to by the junctions are not included.

klayhamn · 2 September 2022 12:57

@gchen as expected, this is what my test yielded:

2022-09-02 15:56:04.644 DEBUG LIST_ENTRIES Listing 
2022-09-02 15:56:04.644 DEBUG PATTERN_EXCLUDE Z is excluded
2022-09-02 15:56:04.644 DEBUG PATTERN_EXCLUDE Y is excluded
2022-09-02 15:56:04.644 DEBUG PATTERN_EXCLUDE X is excluded
2022-09-02 15:56:04.644 ERROR SNAPSHOT_EMPTY No files under the repository to be backed up
No files under the repository to be backed up

My test pattern contained a single include entry for a folder in “Z” that is 3 “layers” deep, e.g.:

Z:\my\cool\sub\folder

After some more testing I discovered what I already know by now - that for this folder to be included I have to add:

+Z\my\cool\sub\
+Z\my\cool\
+Z\my\
+Z\
+Z

to the list of includes

Talk about redundancy…

saspus · 2 September 2022 17:37

Relevant discussion

klayhamn · 2 September 2022 17:49

@saspus Doesn’t seem like any headway was made there toward a solution

I think it’s perfectly fine to support complex regex-based, rule based or voodoo based filteration,
but the most basic case is that of simple inclusion and exclusion of specified folders, across drives – that would (i would venture a guess here) probably cover 95% of actual use-cases in the wild.

in fact, the most basic case doesn’t even involve exclusions (Google Drive for example doesn’t have them, nor does Acronis iirc) - that is - imho - a “bonus” feature,

So way-to-go duplicacy for having exclusions unlike most software, but nay for not supporting multi-drive backup plans out of the box, or even - for that matter, deeply nested folder selections without complex (and truly redundant) manual configurations

saspus · 2 September 2022 18:42

Technically multi-drive backup is supported via first level symlinks; not ideal — but it’s something.

But I agree, simple file inclusion and exclusion is a very basic requirement that has benefit of being

Simple to understand, and therefore used successfully
Sufficient for most users.

And therefore should be a no-brainer to support.

Then there is the whole separate discussion on merits of having dedicated standalone exclusion list in the first place: it will always require maintenance as more data is written to disk, it’s not portable, and just a hassle.

The alternative is to keep metadata (“don’t backup” flag) close to data; for example, applications that create temporary files is in the best position to mark that file for exclusion. This approach is used on macOS where a special extended attribute is used to exclude files from backup. Duplicacy supports it. Now my exclusion list is empty. And it “just works”.

Windows does not have that standard way of excluding files. But IIRC Duplicacy supports a special .nobackup file that can be placed in folders that need to be excluded. This may be an alternative to managing the unwieldy list.