Windows Server - Backup directly from rclone mount or Google File Stream to local disc & google drive?

datao · 11 January 2025 18:24

So I had my data on my Windows Server available as a network file share & I was backing this up with Duplicacy to both a local disc & Google Drive & this has worked relatively fine. But I decided I wanted to migrate the live data to my google drive for convenience.

So now I want to do an active backup of the live data in my Google Drive to both local disc & Google Drive (seperate gdrive)

I have tried installing the native Google Drive client on my Windows Server and made it do a mirror of my data, then the client can run here & sync and Duplicacy will backup any changes in gdrive to local disc/gdrive. But I don’t really like this & think it’s a bit janky, it seems a little unstable.

I tried playing with Google File Stream mode for a bit and it seems I couldn’t target the drives in there with Duplicacy because they were virtual volumes and Duplicacy lacked permission or something? Even if I change it to ‘mount’ to a particular folder, I run into the same issue.

Then I have tried rclone mount (haven’t played with vfs cache yet) - here I could point Duplicacy to the rclone mount, then I imagine I can ask Duplicacy to backup from here to local disc & gdrive. Is this something people are using reliably? I would greatly appreciate it if someone has some good settings for the rclone mount; the disc it runs on is a fast SSD with over 100GB available space.

If I could get either Google File Stream or rclone mount to work reliably, then I could have 2 local backups (or remove 1 of them) as the live data would not need to be on the server at all, only backups. I also wonder how well restores would work in this configuration. When restoring I would be caching the restore to the mount and obviously have to wait for upload. But that would be the case with having a mirror sync on the server as well anyways.

Any feedback appreciated.

datao · 19 January 2025 21:06

I have played with backing up directly from my gdrive rclone mount and it works pretty well. I use VFS cache.

Worst performance is packing small files into chunks for backup, imagine loads of 1 KBs files with 32MB chunk size, it takes quite a while. Does anyone have experience with handling this? Maybe some clever flags for the rclone mount? It performs quite nicely with big files.

saspus · 19 January 2025 21:46

I think it’s an inherent property of remote object storage — each object access is associated with a fixed latency, so small objects access will perform poorly.

It would be interesting to try cunoFS in place of rclone.

datao · 19 January 2025 23:08

Yes, looking at the log the ‘chunker’ or however we refer to it, fetches one of these tiny files for packing every 1s or so, so it takes a while to make a single chunk. Although this rclone mount allows for 200 requests per second, it’s receiving 1 per second ish.
I have these settings on the mount which should allow the requests

--drive-pacer-burst=1000 \
--drive-pacer-min-sleep=10ms \

I don’t know if it’s how duplicacy interacts with the mount. It should be able to do much more requests I would think.

btw I’m seeing messages in the log that it’s packing chunks & then messages like this between:
2025-01-19 23:03:12.945 DEBUG CHUNK_CACHE Skipped chunk a2040bd4642e7b3afbd5d090f1e8eeed57778b7cd5fd6b31bf8ef450d86aeb29 in cache

That it skips them, since I almost uploaded 80% of this data already from another machine, does it still try to pack the data in chunks & then afterwards realise that the chunk already exists in repositry and skips along to the next? And where is this cached. Just thinking for my next backup, when this initial one finally completes.

datao · 19 January 2025 23:34

I will share my full rclone mount config file, maybe someone can see some glaring mistakes in there, that if removed could make performance better. Anyway after this first initial backup I doubt I will be dealing with this many changing smaller files, so I’m not unhappy with it as it is.

[Service]
User=user
Group=user
Type=notify
ExecStartPre=/bin/sleep 10
ExecStart=/usr/bin/rclone mount \
    --allow-other \
    --config=/home/user/.config/rclone/rclone.conf \
    --buffer-size=32M \
    --dir-cache-time=8760h \
    --transfers=8 \
    --drive-pacer-burst=1000 \
    --drive-pacer-min-sleep=10ms \
    --drive-skip-gdocs \
    --poll-interval=15s \
    --rc \
    --rc-addr=localhost:5573 \
    --rc-no-auth \
    --syslog \
    --timeout=10m \
    --umask=002 \
    --use-mmap \
    --user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' \
    --vfs-cache-min-free-space=off \
    --vfs-cache-max-age=2h \
    --vfs-cache-max-size=50G \
    --vfs-cache-mode=full \
    --vfs-cache-poll-interval=1m0s \
    --vfs-fast-fingerprint \
    --vfs-read-ahead=128M \
    --vfs-read-chunk-size-limit=2G \
    --vfs-read-chunk-size=32M \
    -v \
    "gdrive:" "/mnt/gdrive"
ExecStop=/bin/fusermount3 -uz "/mnt/gdrive"
Restart=on-abort
RestartSec=5
StartLimitInterval=60s
StartLimitBurst=3
TimeoutSec=21600
LimitNOFILE=infinity
LimitMEMLOCK=infinity

datao · 20 January 2025 19:54

Having read quite a bit about it, I think I can realistically not expect a latency better than 500ms when accessing/reading files, which murders the performance on a bunch of small files.

VFS cache adds a little overhead to this compared to not having it, maybe 100ms. Also the VFS cache is not much help, because duplicacy is only reading the file once from cloud. If I could ask the cache to preload all files smaller than X & keep them in the cache, then maybe it could make a big difference… But I don’t believe there’s any options like that. I just hope that after this first upload, this won’t take ages again on run two. It shouldn’t.

datao · 20 January 2025 22:32

So rclone can only get 2 files/second, that’s documented here & matches this experience: Google drive

Out of curiosity I asked Google File Stream on my laptop to cache a directory of over 11000 small objects, it did it in 10 minutes. Which means it managed a nearly 20/files per second throughput, ten times the throughput of the rclone mount. Unfortunately that official google client does not exist on linux & I couldn’t get it to work anyway to point my windows server to Google File Stream, Duplicacy would simply say Access Denied because it’s a virtual volume.

towerbr · 21 January 2025 02:22

datao:

rclone mount \
    --allow-other \
    --config=/home/user/.config/rclone/rclone.conf \
    --buffer-size=32M \
    --dir-cache-time=8760h \
    --transfers=8 \
    --drive-pacer-burst=1000 \
    --drive-pacer-min-sleep=10ms \
    --drive-skip-gdocs \
    --poll-interval=15s \
    --rc \
    --rc-addr=localhost:5573 \
    --rc-no-auth \
    --syslog \
    --timeout=10m \
    --umask=002 \
    --use-mmap \
    --user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36' \
    --vfs-cache-min-free-space=off \
    --vfs-cache-max-age=2h \
    --vfs-cache-max-size=50G \
    --vfs-cache-mode=full \
    --vfs-cache-poll-interval=1m0s \
    --vfs-fast-fingerprint \
    --vfs-read-ahead=128M \
    --vfs-read-chunk-size-limit=2G \
    --vfs-read-chunk-size=32M \
    -v \
    "gdrive:" "/mnt/gdrive"

Wow, that’s quite a lot of parameters!

I’ve been successfully backing up for years from rclone mounts with ease, both with mounts pointing to Google Drive and OneDrive.

I simply use the following:

rclone mount \
rclone_remote_name:path \
/mnt/onedrive \
--read-only \
--allow-other \
--fast-list

I clearly remember during my initial tests that --vfs-cache-mode caused several intermittent issues. Once I stopped using it, those problems disappeared. Another parameter that made a significant difference was using --read-only.

datao · 21 January 2025 11:46

Hey thanks, I will try out read-only with no cache when my initial backup is done, to see the difference. Although I think no matter what mount parameters I use, I’m constrained by the 2 files per second transfer.

Since the data set contains over 400.000 objects, this is bound to take a lot of time on initial backup. I did 80% of the upload from a regular windows file system to gdrive, but since it didn’t complete fully, Duplicacy has to go through all the files even if it’s skipping loads of chunks for upload (because they are already in repository.) I should’ve thought of that and done 100% of the upload so I had at least 1 snapshot, I think then it would’ve compared current directory to the 1 snapshot and it wouldn’t have to go through all the files again.

Droolio · 21 January 2025 13:10

Sadly, this is the Google Drive API limit, and since Duplicacy processes files sequentially during backup (despite using multiple threads), it’s never gonna be any faster than that.

What you might be able to do is to pre-cache with some kinda script that utilises rclone cat remote:/path > /dev/null and the mounted cache. You could probably run a duplicacy diff, parse the results and pipe it into the cat (perhaps with --files-from). Limit the max age / total cache size so you don’t run out of space.

Seem like a hassle for just an initial backup but could be a very handy script if you incrementally backup more than a handful of files.

(Even though it might not help much, you missed a flag: --vfs-refresh )

Incidentally, if you just want to get at least 1 snapshot, consider excluding files then gradually include them in in subsequent backups. Not that this matters much, but you might wanna ensure certain important data is safe til the whole thing completes…

datao · 21 January 2025 14:13

Hey Droolio, thank you. Yes you’re right about all that, I will just let it run, it will take probably a few days.

I read some threads that the rclone folks considered to implement some options to customize how the cache behaves (like a keep-min-size files flag) but it hasn’t been implemented yet, so as you said those ideas could only be implemented with scripts at the moment.

About --vfs-refresh, there is a seperate service running that runs it every 3 hours.

Anyway even if I found or wrote scripts myself, this is something to have done beforehand, so the cache already is populated with the small files prior to starting the backup job.