Mount duplicacy snapshot as a virtual filesystem

archon810 · 13 August 2019 04:55

This would be killer. Another vote from me.

tedsmith · 13 August 2019 16:59

Even if the mount were read-only it would be great for me. I’d then have access to my 4T music collection wherever I am while still having a good, encrypted backup…

rockmeister64 · 23 August 2019 23:56

I think it has to be read-only to protect the integrity of the snapshots. I would suspect something like libfuse (GitHub - libfuse/libfuse: The reference implementation of the Linux FUSE (Filesystem in Userspace) interface) to work on Linux, MacOS (BSD) and on Windows (using WLS) would be the way to go.

daniel.hermann · 19 August 2020 20:49

I just wanted to ask if there is an update to this? Anything on the horizon?

gchen · 20 August 2020 01:56

There is a FUSE implementation by @andrew.heberle: Prototype FUSE implementation

pugglewuggle · 20 January 2021 04:46

2nding this. This would greatly enhance the utility of Duplicity. Lots of other packages do this, or at least have an “Explorer” software that lets you select by versions/dates and also search by file name or contents.

cmpute · 5 July 2021 03:22

Also +1 for this feature. I hope this can be integrated in the main duplicacy binary, and it would be even better if there’s a button in GUI to mount the snapshot!

Ithilion · 13 January 2022 12:09

Any update on this feature? It would be a huge addition to the already amazing capabilities of this tool. Also I was going to try the prototype implementation but it’s not building anymore and it seems abandoned.

andrew.heberle · 3 February 2022 06:29

Just an FYI on this one…my implementation is really not very robust unfortunately, and the code (looking back on it now) is very messy.

I haven’t had an opportunity to clean it up into any sort of shape that I would deem “ready”, but I really do hope to get time to concentrate on getting this working well, so hopefully I have some good news over the next 6-months or so.

lahma69 · 17 October 2022 22:52

I really hope you get some time/motivation to continue working on your project! Being able to mount a Duplicacy snapshot to a user-mode file system would be fantastic. As a developer myself, I would love to offer help in further developing the project, but unfortunately, I have literally zero experience with Go. I have experience with a wide variety of languages including C#, C++, Python, Ruby, Java, etc, so I’m sure learning Go wouldn’t be too overwhelming, but I would likely just be a nuisance/hindrance until I obtained more experience with Go. It looks like the project is dead at this point, not having been worked on in over 2 years. Have you pretty much decided to abandon the project or is there any chance you might revisit it at a later date? Just curious…

sevimo · 20 October 2022 16:58

There is a new PR that just popped up on github, looks very promising though I haven’t tried it yet:

github.com/gilbertchen/duplicacy

Live backup preview (mount)

gilbertchen:master ← davidrios:mount-backup

opened 03:40AM - 20 Oct 22 UTC

davidrios

+913 -130

This work is inspired by [!628](https://github.com/gilbertchen/duplicacy/pull/62…8), but it's a more complete implementation. I'm using [cgofuse](https://github.com/winfsp/cgofuse) so it works on all platforms, including Windows. The command is just `duplicacy mount POINT`, no revision needed. When you run it for the first time, it'll load the list of revisions and create a base tree organizing them by date using the format `/%YYYY/%MM/%DD/%HH%mm.%REV`. For instance, let's say you have this output from `duplicacy list`: ``` $ duplicacy list Snapshot Users revision 1 created at 2022-10-19 20:09 -hash Snapshot Users revision 2 created at 2022-10-19 21:38 Snapshot Users revision 3 created at 2022-10-20 10:38 Snapshot Users revision 4 created at 2022-11-14 13:44 ``` The following base tree will be created: ``` 2022/ 10/ 19/ 2009.1/ 2138.2/ 20/ 1038.3/ 11/ 14/ 1344.4/ ``` There're a couple of reasons for that: - If you have thousands of revisions, they would make a huge single list - When you open a folder on Windows explorer, it'll try to list the contents of the next level, which would be bad if all revisions were on the same level - The revision number is appended to avoid collision for revisions created at the same `HH:mm` After the initial structure is created, the program will lazily load and create the file structure when you try to list the contents of (or access a file in) one of revision folders for the first time, then it'll be cached until program end. Folders and files will display their saved attributes, with the caveat that everything that's not a folder is shown as a regular file. File reading is efficiently implemented by only downloading the chunks needed for the specific OS read request and caching them. The cache is a 2Q LRU cache by [golang-lru](https://pkg.go.dev/github.com/hashicorp/golang-lru#TwoQueueCache) with size 20 and keyed by the chunk hash. I've tested using a repository with these characteristics: - 10.2 GB - 70151 Files - 10467 Folders - 4 revisions - sftp storage on a remote server It takes a couple of seconds to create the base tree and a few seconds for every revision dir that is loaded for the first time, but otherwise everything works as expected. Memory use hovered around 700MB, even tarring a whole revision dir and piping it to `/dev/null`.

Someone with large number of revisions (thousands+?) should try it out to see how it scales with number of revisions.

david.rios.gomes · 20 October 2022 18:00

Published a couple binaries here for testing: Release v3.0.1-mount · davidrios/duplicacy · GitHub
USE IT AT YOUR OWN RISK.

Droolio · 20 October 2022 20:19

This seems like a very promising mount implementation, and I especially like rationale behind the folder structuring - if indeed it can handle a large number of revisions…

My first thought is that it should maybe allow revisions to be specified, in case it wouldn’t scale so well, and you could at least skip any overhead if that were the case. You might only be interested in specific revisions after all - a list, or range, say. Anyway, I tested your binary…

Repository of 52GB (366K files) in the latest snapshot revision. Repository size 585GB total, in 316 revisions (a fair bit larger than normal due to inefficient de-duplication of Thunderbird’s monolithic mailbox files, which I’m totally fine with).

Backups run every 2 hours (so 12 per day), pruned on this schedule: -keep 30:365 -keep 7:90 -keep 1:14 - going back to June 2018.

Upon opening the first folder level near where my latest revision is, it takes a good few minutes to even list the 10 snapshots there. Memory balloons to 4GB before I even open a snapshot’s root (which takes a further few minutes to finish after that).

Opening other snapshots gets progressively quicker but not by much, and by the time I’ve located a couple snapshots to compare against (with a tool such as Beyond Compare), we’re up to 7.5GB memory usage, and quite a lot of waiting for Windows Explorer to become responsive.

This is certainly a lot better then restoring revisions but IMO the memory usage is an issue, especially if it’s to scale up. It’s just about usable.

In fact, I’m not sure it’s the number of revisions which is the cause - most likely the number of files in each snapshot. The only thing I do know is that the debug messages in the console suggest it’s trying to load all snapshots at the bottom of the tree (so like all 10 of my revisions from today). So honestly, I dunno how well this would scale with a lot more files, and a lot more revisions backed up more regularly.

Good work, though. Will keep an eye on progress.

sevimo · 20 October 2022 20:29

Is this storage local or remote? And how long does list command take by itself (mount won’t be faster than that )

Droolio · 20 October 2022 20:45

Window Server network share (UNC), and list is pretty quick. It just hangs Windows Explorer while listing the snapshots, and then the console is referring to revision numbers I didn’t even click on.

david.rios.gomes · 20 October 2022 21:05

Thanks for the valuable insights.

An option to specify a single revision makes sense to optimize access on large backups, will add that.

Some issues are due to the way Windows explorer works. If you open a day folder for instance, Windows explorer will try to list the contents of all children, so if you have 12 revisions a day, it will cause the program to initialize all 12 revisions. At 366k files each, that’s 4M+ file descriptions loaded in memory already. It may be useful to add another level to prevent that as an optimization.

Some big memory usage comes from the internal APIs. When I was testing with my modest repository, some memory spikes would occur while accessing internal APIs to download chunks and other things. I’m probably doing some things wrong, as I’m not familiar with the code base and internal APIs.

I’ll try a couple of things and provide another test binary as soon as I can.

david.rios.gomes · 20 October 2022 21:19

Unfortunately I don’t have any real personal backups to test yet, given that I’ve heard about this tool for the first time 5 days ago, while looking for a backup solution for my sister . While evaluating it, I found the clunky restore interface and lack of live mounting deal breakers, but luckily it’s (kind of) open source…

It’s also basically my first ever Go code, so I’m definitely doing some things wrong performance wise

sevimo · 21 October 2022 16:05

OK, so I tried the latest build on Linux, and here are my observations/suggestions, in no particular order:

It mostly worked!
Tested storage was remote (OneDrive) running in parallel with something else, so I did hit some 429 (rate limit) retry-able errors - as expected. As such, while it did take some time to get initial listing, it was about the same time as running list command
Caching directories seems to be working fine, subsequent access was basically instantaneous
File access to individual files in snapshots worked, this is great! However, I suspect there is no support for threaded loads, which is not ideal for remote storages (see below)
I played around with 12TB repository, and as I suspected there are no issues with large storages per-se; large number of revisions and/or huge number of files in a single directory might be a different story (not tested)
I did not notice excessive memory consumption; saw perhaps 1-1.5GB at most, certainly not more than running the same storage/snapshots. Having said that, I did not run extensive tests, just poked around for a bit.
Not sure if mount command is supposed to fork into background and exit (it didn’t) or it was supposed to be running while mount is active. But then I killed it with Ctrl-C it did not unmount, I had to run umount separately

So based on the (very) limited testing, I’d say that core functionality works well. However, there are several things that can be improved, primarily on the customization/parametrization part.

Support multiple snapshots in the mounted tree. Basically, instead of top-level folder being years of a particular snapshot, make the top level out of all the snapshot names in the storage, with each having existing folder structure underneath.
Support multiple storages (-storage flag). I believe right now mount always picks the first storage from the preference file
Support multithreaded chunk download (-threads flag). This is important for remote storages as single threaded downloads could significantly underutilize available bandwidth.
Some optimizations on reducing scope of the mount, this is a nice-to-have. Basically, instead of using all snapshots and all revisions, ability to use only 1 snapshot/all revisions or even 1 snapshot/1 revision. This might be important for large/slow storages and applications that pre-load more than is needed (e.g. Window Explorer)

Again, it worked well for me so far. I hope more people can test it so we can see if there are any edge cases that are not covered.

mirallo.sebastien · 22 October 2022 12:22

Sorry i am not developper: I am on Windows 10; my backupped files are on C:\SourceDuplicacy1
I have installed WinFsp and replaced duplicacy.exe by the version of davidrios
command:

CD /D C:\SourceDuplicacy1
duplicacy mount Z:

This show “mounting travail on Z:” “Found xxx revisions” but Z: don’t appear in explorer

mirallo.sebastien · 22 October 2022 12:23

PS: i make a tutorial French for using duplicacy and send mails in Sauvegarder avec duplicacy et le stockage en ligne Storj – PCsoleil Informatique