Mount duplicacy snapshot as a virtual filesystem

OK, this one did not quite work. Even when not using -disk-cache option, it is still looking for sqinn, and segfaults otherwise:

Failed to init: exec: “sqinn”: executable file not found in $PATH
runtime error: invalid memory address or nil pointer dereference
goroutine 1 [running]:
runtime/debug.Stack(0x0, 0x0, 0x0)
/home/david/go/src/runtime/debug/stack.go:24 +0xa5
runtime/debug.PrintStack()
/home/david/go/src/runtime/debug/stack.go:16 +0x25
github.com/gilbertchen/duplicacy/src.CatchLogException()
/home/david/work/duplicacy/src/duplicacy_log.go:233 +0x225
panic(0x1619ae0, 0x1f693f0)
/home/david/go/src/runtime/panic.go:971 +0x4c7
github.com/cvilsmeier/sqinn-go/sqinn.(*Sqinn).Terminate(0x0, 0x0, 0x0)
/home/david/go/pkg/mod/github.com/cvilsmeier/sqinn-go@v1.1.2/sqinn/sqinn.go:694 +0x42
github.com/gilbertchen/duplicacy/src.(*BackupFS).cleanup(0xc0003b0d90)
/home/david/work/duplicacy/src/duplicacy_mount.go:888 +0x36
github.com/gilbertchen/duplicacy/src.MountFileSystem(0x7fff7c3d1869, 0x7, 0xc0003b0cb0, 0xc00057b480)
/home/david/work/duplicacy/src/duplicacy_mount.go:1014 +0x3a7
main.mountBackupFS(0xc0003c25a0)
/home/david/work/duplicacy/duplicacy/duplicacy_main.go:1468 +0x11d0
github.com/gilbertchen/cli.Command.Run(0x17c81aa, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x17ee7f1, 0x22, 0x0, …)
/home/david/go/pkg/mod/github.com/gilbertchen/cli@v1.2.1-0.20160223210219-1de0a1836ce9/command.go:160 +0xd76
github.com/gilbertchen/cli.(*App).Run(0xc0003c2120, 0xc0000c4000, 0x5, 0x5, 0x0, 0x0)
/home/david/go/pkg/mod/github.com/gilbertchen/cli@v1.2.1-0.20160223210219-1de0a1836ce9/app.go:179 +0x1145
main.main()
/home/david/work/duplicacy/duplicacy/duplicacy_main.go:2335 +0x7da6
Command exited with non-zero status 101

EDIT: Actually, it fails the same way even if I have working sqinn in path. I have tried to place it into PATH, into current repository folder, into the same folder as duplicacy executable - it still can’t find it for some reason.

That’s strange, should work with the binary in the PATH at least. I’ll fix it later today

1 Like

Ok, new fixed binaries uploaded.

1 Like

OK, so this improved things quite a bit. sqinn was not found because I was running :d: under sudo with different environment, so now it works without sqinn when -disk-cache is not specified, and with sqinn in the PATH when -disk-cache is specified. However, it still segfaults when -disk-cache is specified, but sqinn is not found. It should really need to fail more gracefully, but that’s an easy fix.

By the way, how does -disk-cache works? Where/how/what is cached?

Filtering by revision numbers seems to work. I haven’t done any stress testing on that (i.e. non-existent/malformed revision lists), but it works for core cases. It seems that it doesn’t do extensive reading (if any) of filtered out revisions as it feels faster with single revision specified. Can you confirm what happens with revisions that are not displayed?

-flat is nice, I’d probably use pure numerical revision names (without timestamps in front) for the cases when you’re looking for specific revision number, but there is no end on how these lists can be potentially presented/customized, it is certainly workable as core functionality is there.

-storage seems to work as well, cool!

This leaves inability to (easily) access multiple snapshot names in the same storage and threaded downloads. Multiple downloads probably has a workaround (creating separate repositories that reference different snapshots, haven’t tried it but it should work, though kinda ugly). Single-threaded download doesn’t affect functionality per se, but can make actual restores via mount unbearable slow on some storages.

All in all, great work so far!

Fixed crash when sqinn not on path and -disk-cache used.

-disk-cache creates sqlite databases in the OS temp folder with file information from each revision and reads them from there. chunks are still cached in memory. duplicacy already has an internal chunk cache, but it doesn’t work fast enough for these use cases. the databases are deleted on program end, but they are not when running, so if you open thousands of revisions with millions of files, you could potentially run out of disk space, but I’m not worried about that TBH.

handling of revisions specified is mostly graceful, with useful messages for invalid specifications. it is indeed faster because it only downloads metadata from revisions that will be listed.

I feel like the formatting of -flat is the right one. revision numbers are still appended, so they are visible at a glance, and also revisions are chronological in nature, so it’ll always be shown in the revision number order, provided you sort files by name on your browsing tool. to me it’s also what makes most sense when you’re just mounting without specifying a revision or listing them before, because you’re usually more interested in when the revision was created.

The chunk downloader internal API does have an option to specify how many threads and I increased it to 3, so in theory it’s running 3 in parallel.

I’m planning on adding a mount-storage command for the last remaining use case.

Automatic unmounting on Linux doesn’t work not matter what I do for some reason. Need to investigate that further.

1 Like

I am, somewhat, as I did run into problems like that before. In many cases, :d: would be running on some appliance device, or even server with minimal primary drive; VMs would also usually have minimal disk space allocated. VFS caching can clog /tmp easily, which may crash the whole system. Real-life example was running rclone on a mini server with a couple of GB available on small primary SSD, and rclone VFS caching tried to place 5GB file there which did not end well. There are reasons why rclone allows specification of VFS cache both in terms of location and maximum sizes. I am not familiar with sqinn, but hopefully there is a way to specify either a specific destination and/or maximum capacity for the disk cache.

I am thinking more about script access, where you don’t look at the list manually :wink: But as I said, this is a non-issue as script can certainly find the right revision in the list, it will simply take marginally more work on filtering. If anything, I’d prefer to have some special marker for revision list (e.g. -revisions last) that always selects single most recent revision. Again, nice-to-have, can certainly be worked around.

I really think you should just expose -threads argument to mount, the same way it works with other commands. Fixed number of threads is not ideal, as for some storages it will still not be enough, and for others it might be too much and you’ll be bumping into rate limiters. But if the code supports multithreaded downloads already, this should be a fairly trivial thing to implement.

Is there a reason to split it into a separate command? Not that I see a particular problem from a user perspective, but it sounds more like another parameter to mount rather than its own command. But I haven’t looked at implementation, perhaps it makes more sense.

Yeah, -threads is already done, and also rate limiting, those will be in the next release.

Adding more features is not something I’m willing to put in time now. Maybe in the future, or maybe the maintainers can add them to make it feature complete.

A new command is needed because it has some semantic differences on arguments. It uses the same underlying code, it’s just a matter of parameter parsing.

2 Likes

Latest binaries uploaded.

  • New command: mount-storage, you can now mount a storage directly without being in a repository directory. Root level are folders with the snapshot ids in the storage, followed by one empty level to prevent Windows explorer from loading too early, and then the same structure from regular mount. You may need to specify a repository dir anyway depending on your storage backend, for instance to validate SSH hosts.
  • New options added
  • Fix unmount on Linux

Unless there’s a breaking bug, this is probably my last release, as it’s already working good enough for me. The PR remains open, the official maintainers can pickup from there if they want to improve upon the base implementation.

Your efforts are appreciated and I hope some form of mount eventually gets into a full release… I’ve yet to test your latest version but I’m hoping the memory use is improved.

However, I would point out that the choice of parametrisation will probably have to change to match the rest of Duplicacy’s CLI interface - i.e. working from the current repository. I agree with @sevimo that a mount-storage is probably unnecessary, when mount -all, and -storage, -id and -r flags already exist for other commands that might operate on the non-default storage, and matching that nomenclature would be preferred. I’m also unsure if a separate disk cache, stored in a completely separate location, is desirable.

(TBH, Duplicacy is a bit lacking in regards to allowing multiple revisions and/or IDs to be referenced in a single command; it’d be nice if -r supported ranges or a list, for example.)

But that’s something for @gchen to ultimately decide…

The parametrization already matches the standard ones. mount follows the list and related, and mount-storage follows the same as info, which is the other command that works directly on storages.

That’s not quite true - most operations, like list, operate on just the current storage unless you use -all or -id. There is no list-storage, for example, because it’s handled by just list. That’s the way it should work with mount IMO - mount on just the current storage, with mount -all doing all.

info is different because it was meant to be used by the GUI only.

Well, regardless, it’s there and it makes sense in this context.

Thanks, I managed to make it work, took a bit of time figuring out how to set up parameters. I’d also say that it works good enough for me, any other improvements can be easily made on top of this work if necessary. One last thing that I noticed is that though now unmount seems to work fine on exit, I still get this in the console:

/bin/fusermount: failed to unmount /mnt/OD: Invalid argument

I am running Debian if that matters.

Wow, I am kind of shocked by how quickly you managed to write up an example implementation of a mount command @david.rios.gomes . Really impressive stuff. I’m going to try to download the build and test it out later this weekend if I have time but this is obviously an excellent start to a complete, official mount implementation. Given that you have familiarized yourself with the Duplicacy code and are familiar with implementing a FUSE file system for it, perhaps you could give a bit of insight into a couple of questions I have? If not, no big deal.

How practical would it be to use something like this (either in its current form or a future, optimized form) in a semi-permanent fashion? For example, many people currently use rclone’s mount command for mounting cloud storage to a local file system for the purposes of running a Plex server. Do you foresee something like that being practical using this instead of rclone? That is, mounting a media folder found in a specific snapshot/revision and using that mount for Plex or something similar?

How difficult and/or practical do you think it would be to implement some type of write support into a future implementation of this mount command? For example, mount a specific revision and then any writes on top of that revision are written back to the online storage every x number of min, possibly with each accumulated write session having its own revision #? That doesn’t sound like too elegant of a solution but I’m not sure how else you would do it… Perhaps there is another approach I’m not considering?

Anyways, thanks for all the hard work and effort you put into creating this FUSE/mount implementation. I believe this alone is a huge step forward for Duplicacy!

Yeah, I tried but couldn’t get rid of that message. Seems like it either tries to umount twice or not at all, so twice it is.

Hey.

I think that’s probably already OK with the current implementation. It just doesn’t automatically refresh the repository, so if new revisions are added you need to stop and start the mount command again, but overall I think the current implementation is already stable enough for that.

Write support is very unlikely given the nature of the tool itself. Probably never going to happen.

:slightly_smiling_face: :pray:

1 Like

I see this wasn’t included in the newest release, though everything looks good at the pull request, as far as I can tell. Do we know anthing about the status?

IMO, the implementation is far from stable and properly tested, and the command line usage needs a lot of rework to make it consistent with current syntax. The memory usage was insane in my testing, when compared to Rclone mount, for example.

Hi,
Has this been worked upon since last reply? I’m using the webUi and cannot see this option.

Hello, no, still not implemented in the GUI, and not in the latest CLI version either.