Mount duplicacy snapshot as a virtual filesystem

Continuing the discussion from Use standard UI controls whenever possible:

What is duplicacy snapshot? It’s a virtual immutable filesystem, effectively, with its own internal API and a lot of code around it to access it and eventually produce just a handful few useful scenarios:

  1. copy files out of there (aka restore)
  2. copy files into a new leaf with deduplication (aka backup)
  3. delete the leaf (aka prune)

So, why not take one more step further and implement an actual user mode VFS?

This opens up tons of opportunities and elegant workflows:

For example,

  1. during restore, instead of browsing in awkward custom control user can use native OS file browser (see linked topic).
  2. Or, just click a button after selecting revision and the snapshot in mounted as readonly filesystem. As a drive letter. or into /Volumes/mybackup_r139. Then user can use all sorts of tools such as diff, “Find Duplicates”, find, grep, etc. (otherwise user would need to checkout entire repository into a temp folder and then do all of that, which is wasteful for bandwidth and time)
  3. I can imagine that this will wrap around a lot of core in restore workflow and make design simpler.

Now, I’m not an expert in user mode filesystem design, or in user mode software development for that matter, (and while I see how can this be done fairly easily as a kernel extension – I would not recommend it; it should stay in userland if possible) but quick google search yields a few vfs support libraries in goland… so maybe there is hope :slight_smile:

Perhaps something to consider for revision 3.xx.xx?

(I makes me happy to even think about possibility to mount backed up snapshot as a filesystem, it’s super cool)

19 Likes

This would be sort of like MacOS’s implementation of “Enter Time Machine”!

2 Likes

Interesting. This would be a functionality similar to rclone mount.

The users there try to use this functionality to mount volumes and use with Plex and etc. from the cloud. I think it’s a very heavy use, and problems with latencies and cache are often reported.

But for the simple use proposed above (directly accessing the content of the revisions) is undoubtedly interesting.

I’m thinking of a command in the CLI version:

duplicacy mount P: storage:stor_name snapshot:snap_id revision:134 :yum:

3 Likes

I would absolutely love love this functionality, but obviously it sounds like a lot of work (but imo, doable).

Duplicacy already has a form of chunk cache, which includes metadata, so a fuse style mount system might be able to be built around that.

1 Like

This is one of the most requested features. It is on my to-do list but I don’t have a timeline for it.

18 Likes

I just want to +1 this feature request

2 Likes

This would be killer. Another vote from me.

1 Like

Even if the mount were read-only it would be great for me. I’d then have access to my 4T music collection wherever I am while still having a good, encrypted backup…

2 Likes

I think it has to be read-only to protect the integrity of the snapshots. I would suspect something like libfuse (GitHub - libfuse/libfuse: The reference implementation of the Linux FUSE (Filesystem in Userspace) interface) to work on Linux, MacOS (BSD) and on Windows (using WLS) would be the way to go.

2 Likes

I just wanted to ask if there is an update to this? Anything on the horizon?

1 Like

There is a FUSE implementation by @andrew.heberle: Prototype FUSE implementation

2 Likes

2nding this. This would greatly enhance the utility of Duplicity. Lots of other packages do this, or at least have an “Explorer” software that lets you select by versions/dates and also search by file name or contents.

1 Like

Also +1 for this feature. I hope this can be integrated in the main duplicacy binary, and it would be even better if there’s a button in GUI to mount the snapshot!

1 Like

Any update on this feature? It would be a huge addition to the already amazing capabilities of this tool. Also I was going to try the prototype implementation but it’s not building anymore and it seems abandoned.

1 Like

Just an FYI on this one…my implementation is really not very robust unfortunately, and the code (looking back on it now) is very messy.

I haven’t had an opportunity to clean it up into any sort of shape that I would deem “ready”, but I really do hope to get time to concentrate on getting this working well, so hopefully I have some good news over the next 6-months or so.

6 Likes

I really hope you get some time/motivation to continue working on your project! Being able to mount a Duplicacy snapshot to a user-mode file system would be fantastic. As a developer myself, I would love to offer help in further developing the project, but unfortunately, I have literally zero experience with Go. I have experience with a wide variety of languages including C#, C++, Python, Ruby, Java, etc, so I’m sure learning Go wouldn’t be too overwhelming, but I would likely just be a nuisance/hindrance until I obtained more experience with Go. It looks like the project is dead at this point, not having been worked on in over 2 years. Have you pretty much decided to abandon the project or is there any chance you might revisit it at a later date? Just curious…

1 Like

There is a new PR that just popped up on github, looks very promising though I haven’t tried it yet:

Someone with large number of revisions (thousands+?) should try it out to see how it scales with number of revisions.

Published a couple binaries here for testing: Release v3.0.1-mount · davidrios/duplicacy · GitHub
USE IT AT YOUR OWN RISK.

1 Like

This seems like a very promising mount implementation, and I especially like rationale behind the folder structuring - if indeed it can handle a large number of revisions…

My first thought is that it should maybe allow revisions to be specified, in case it wouldn’t scale so well, and you could at least skip any overhead if that were the case. You might only be interested in specific revisions after all - a list, or range, say. Anyway, I tested your binary…

Repository of 52GB (366K files) in the latest snapshot revision. Repository size 585GB total, in 316 revisions (a fair bit larger than normal due to inefficient de-duplication of Thunderbird’s monolithic mailbox files, which I’m totally fine with).

Backups run every 2 hours (so 12 per day), pruned on this schedule: -keep 30:365 -keep 7:90 -keep 1:14 - going back to June 2018.

Upon opening the first folder level near where my latest revision is, it takes a good few minutes to even list the 10 snapshots there. Memory balloons to 4GB before I even open a snapshot’s root (which takes a further few minutes to finish after that).

Opening other snapshots gets progressively quicker but not by much, and by the time I’ve located a couple snapshots to compare against (with a tool such as Beyond Compare), we’re up to 7.5GB memory usage, and quite a lot of waiting for Windows Explorer to become responsive.

This is certainly a lot better then restoring revisions but IMO the memory usage is an issue, especially if it’s to scale up. It’s just about usable.

In fact, I’m not sure it’s the number of revisions which is the cause - most likely the number of files in each snapshot. The only thing I do know is that the debug messages in the console suggest it’s trying to load all snapshots at the bottom of the tree (so like all 10 of my revisions from today). So honestly, I dunno how well this would scale with a lot more files, and a lot more revisions backed up more regularly.

Good work, though. Will keep an eye on progress. :slight_smile:

Is this storage local or remote? And how long does list command take by itself (mount won’t be faster than that :wink: )