Mount duplicacy snapshot as a virtual filesystem

This would be killer. Another vote from me.

1 Like

Even if the mount were read-only it would be great for me. I’d then have access to my 4T music collection wherever I am while still having a good, encrypted backup…

2 Likes

I think it has to be read-only to protect the integrity of the snapshots. I would suspect something like libfuse (GitHub - libfuse/libfuse: The reference implementation of the Linux FUSE (Filesystem in Userspace) interface) to work on Linux, MacOS (BSD) and on Windows (using WLS) would be the way to go.

2 Likes

I just wanted to ask if there is an update to this? Anything on the horizon?

1 Like

There is a FUSE implementation by @andrew.heberle: Prototype FUSE implementation

2 Likes

2nding this. This would greatly enhance the utility of Duplicity. Lots of other packages do this, or at least have an “Explorer” software that lets you select by versions/dates and also search by file name or contents.

1 Like

Also +1 for this feature. I hope this can be integrated in the main duplicacy binary, and it would be even better if there’s a button in GUI to mount the snapshot!

1 Like

Any update on this feature? It would be a huge addition to the already amazing capabilities of this tool. Also I was going to try the prototype implementation but it’s not building anymore and it seems abandoned.

1 Like

Just an FYI on this one…my implementation is really not very robust unfortunately, and the code (looking back on it now) is very messy.

I haven’t had an opportunity to clean it up into any sort of shape that I would deem “ready”, but I really do hope to get time to concentrate on getting this working well, so hopefully I have some good news over the next 6-months or so.

6 Likes

I really hope you get some time/motivation to continue working on your project! Being able to mount a Duplicacy snapshot to a user-mode file system would be fantastic. As a developer myself, I would love to offer help in further developing the project, but unfortunately, I have literally zero experience with Go. I have experience with a wide variety of languages including C#, C++, Python, Ruby, Java, etc, so I’m sure learning Go wouldn’t be too overwhelming, but I would likely just be a nuisance/hindrance until I obtained more experience with Go. It looks like the project is dead at this point, not having been worked on in over 2 years. Have you pretty much decided to abandon the project or is there any chance you might revisit it at a later date? Just curious…

1 Like

There is a new PR that just popped up on github, looks very promising though I haven’t tried it yet:

Someone with large number of revisions (thousands+?) should try it out to see how it scales with number of revisions.

Published a couple binaries here for testing: Release v3.0.1-mount · davidrios/duplicacy · GitHub
USE IT AT YOUR OWN RISK.

1 Like

This seems like a very promising mount implementation, and I especially like rationale behind the folder structuring - if indeed it can handle a large number of revisions…

My first thought is that it should maybe allow revisions to be specified, in case it wouldn’t scale so well, and you could at least skip any overhead if that were the case. You might only be interested in specific revisions after all - a list, or range, say. Anyway, I tested your binary…

Repository of 52GB (366K files) in the latest snapshot revision. Repository size 585GB total, in 316 revisions (a fair bit larger than normal due to inefficient de-duplication of Thunderbird’s monolithic mailbox files, which I’m totally fine with).

Backups run every 2 hours (so 12 per day), pruned on this schedule: -keep 30:365 -keep 7:90 -keep 1:14 - going back to June 2018.

Upon opening the first folder level near where my latest revision is, it takes a good few minutes to even list the 10 snapshots there. Memory balloons to 4GB before I even open a snapshot’s root (which takes a further few minutes to finish after that).

Opening other snapshots gets progressively quicker but not by much, and by the time I’ve located a couple snapshots to compare against (with a tool such as Beyond Compare), we’re up to 7.5GB memory usage, and quite a lot of waiting for Windows Explorer to become responsive.

This is certainly a lot better then restoring revisions but IMO the memory usage is an issue, especially if it’s to scale up. It’s just about usable.

In fact, I’m not sure it’s the number of revisions which is the cause - most likely the number of files in each snapshot. The only thing I do know is that the debug messages in the console suggest it’s trying to load all snapshots at the bottom of the tree (so like all 10 of my revisions from today). So honestly, I dunno how well this would scale with a lot more files, and a lot more revisions backed up more regularly.

Good work, though. Will keep an eye on progress. :slight_smile:

Is this storage local or remote? And how long does list command take by itself (mount won’t be faster than that :wink: )

Window Server network share (UNC), and list is pretty quick. It just hangs Windows Explorer while listing the snapshots, and then the console is referring to revision numbers I didn’t even click on.

Thanks for the valuable insights.

An option to specify a single revision makes sense to optimize access on large backups, will add that.

Some issues are due to the way Windows explorer works. If you open a day folder for instance, Windows explorer will try to list the contents of all children, so if you have 12 revisions a day, it will cause the program to initialize all 12 revisions. At 366k files each, that’s 4M+ file descriptions loaded in memory already. It may be useful to add another level to prevent that as an optimization.

Some big memory usage comes from the internal APIs. When I was testing with my modest repository, some memory spikes would occur while accessing internal APIs to download chunks and other things. I’m probably doing some things wrong, as I’m not familiar with the code base and internal APIs.

I’ll try a couple of things and provide another test binary as soon as I can.

Unfortunately I don’t have any real personal backups to test yet, given that I’ve heard about this tool for the first time 5 days ago, while looking for a backup solution for my sister :joy:. While evaluating it, I found the clunky restore interface and lack of live mounting deal breakers, but luckily it’s (kind of) open source…

It’s also basically my first ever Go code, so I’m definitely doing some things wrong performance wise :grimacing:

OK, so I tried the latest build on Linux, and here are my observations/suggestions, in no particular order:

  • It mostly worked! :wink:
  • Tested storage was remote (OneDrive) running in parallel with something else, so I did hit some 429 (rate limit) retry-able errors - as expected. As such, while it did take some time to get initial listing, it was about the same time as running list command
  • Caching directories seems to be working fine, subsequent access was basically instantaneous
  • File access to individual files in snapshots worked, this is great! However, I suspect there is no support for threaded loads, which is not ideal for remote storages (see below)
  • I played around with 12TB repository, and as I suspected there are no issues with large storages per-se; large number of revisions and/or huge number of files in a single directory might be a different story (not tested)
  • I did not notice excessive memory consumption; saw perhaps 1-1.5GB at most, certainly not more than :d: running the same storage/snapshots. Having said that, I did not run extensive tests, just poked around for a bit.
  • Not sure if mount command is supposed to fork into background and exit (it didn’t) or it was supposed to be running while mount is active. But then I killed it with Ctrl-C it did not unmount, I had to run umount separately

So based on the (very) limited testing, I’d say that core functionality works well. However, there are several things that can be improved, primarily on the customization/parametrization part.

  • Support multiple snapshots in the mounted tree. Basically, instead of top-level folder being years of a particular snapshot, make the top level out of all the snapshot names in the storage, with each having existing folder structure underneath.
  • Support multiple storages (-storage flag). I believe right now mount always picks the first storage from the preference file
  • Support multithreaded chunk download (-threads flag). This is important for remote storages as single threaded downloads could significantly underutilize available bandwidth.
  • Some optimizations on reducing scope of the mount, this is a nice-to-have. Basically, instead of using all snapshots and all revisions, ability to use only 1 snapshot/all revisions or even 1 snapshot/1 revision. This might be important for large/slow storages and applications that pre-load more than is needed (e.g. Window Explorer)

Again, it worked well for me so far. I hope more people can test it so we can see if there are any edge cases that are not covered.

Sorry i am not developper: I am on Windows 10; my backupped files are on C:\SourceDuplicacy1
I have installed WinFsp and replaced duplicacy.exe by the version of davidrios
command:

CD /D C:\SourceDuplicacy1
duplicacy mount Z:

This show “mounting travail on Z:” “Found xxx revisions” but Z: don’t appear in explorer

PS: i make a tutorial French for using duplicacy and send mails in Sauvegarder avec duplicacy et le stockage en ligne Storj – PCsoleil Informatique