Using Google Drive File Stream with :d:

TheBestPessimist · 10 January 2020 05:55

I’m curious why this happened? could it be something that did which GDFS didn’t like, or is this a real bug in GDFS?

I have had some weird bugs with GDFS and talked to google support folks but I believe they were all fixed after I managed to repro them consistently.

Akirus · 10 January 2020 06:10

No idea, it only happened to that one chunk. Hasn’t happened again in the few days since then for any of my backups. I don’t think it would have been anything on the part of Duplicacy, since it was just writing to the local cache. I had a lot of data stored in the cache queued for upload, probably a couple hundred gigabytes. Evidently, there was nothing wrong with the chunk itself since I was able to restore the backup after I manually replaced the empty chunk.

I did change the GDFS cache directory to a Drivepool of SSDs to expand the amount of data I could upload without filling the cache drive completely. Perhaps that had something to do with it? It only happened once in >1TB of uploads, though. I even rebalanced the pool a few times mid-upload to see if it would throw it off, but didn’t have any issues.

I would have no idea how to reproduce the error since I don’t recall doing anything in particular at the time that it occurred. I can’t even say for sure whether Duplicacy was running at the time, or if GDFS was just working through the upload queue (iirc it reached around 300,000 files at some point).

TheBestPessimist · 10 January 2020 07:00

I don’t think this huge size was an issue, as with my last computer, i have had more than 500GB and a few million chunks to upload via GDFS in a single revision at one point, and they were all stored in the cache and uploaded slowly.

After some time they were all uploaded successfully ¯\_(ツ)_/¯ afaict. I will however run a restore for all my storages, just to make really sure that shit didn’t hit the fan.

@gchen wdyt, is there a way to make either check or backup to local drive even smarter than they currently are so that the find chunks which are 0 sized? (even though i’m asking this, i’m not sure implementing such a check will help in this particular case :-?)

Akirus · 10 January 2020 12:58

I only noticed today that there was a ‘notifications’ tab on the GDFS interface, which contained the following message:

M: being the mounted GDFS drive and P:\DriveFS… being the GDFS cache directory.

At least with this particular issue, it was easy enough to resolve since:
a) There was a clear error message detailing which file wasn’t uploaded
b) It uploaded an empty chunk with the correct filename which made it possible to locate the correct directory to place the chunk from lost & found

I suppose it would be much worse if it happened for thousands of chunks. I did restore a few other snapshots that didn’t use the missing chunk and haven’t run into errors yet, so fingers crossed it’s just that one isolated event.

For now, I’ve added a Windows Defender exclusion to the GDFS drive and cache as recommended by the error message. I noticed the Defender process uses up quite a bit of resources when GDFS is running so maybe it will help in that regard as well.

Droolio · 10 January 2020 16:15

I think this would be a smart idea actually…

I’ve encountered a handful of situations where I’ve seen missing chunks, 0-byte chunks, and even one chunk that had totally the wrong content*.

Then again, in these instances I am using Duplicacy in a fairly heavy-duty way - copying large Vertical Backup storages, with lots of fixed sized chunks, to another sftp storage. I’ve had no issues with my own personal Duplicacy backups, though.

But yep, I’ve witnessed a few instances where I’ve had 0-byte chunks on the destination storage and had to fix it by removing the chunk and removing the revisions in \snapshots and re-copy that particular revision. It should be a very simple additional check on its file size.

I think before I did suggest a possible improvement whereby the chunk file names could be appended with extra metadata - such as the hash of the chunk. (Since this is the method by which Duplicacy has a database-less architecture.) If not a hash, then maybe chunk size in bytes? A check for 0-byte chunks is certainly useful, but an exact match maybe more important given that @Akirus’s experience has shown a 0.2MB discrepancy.

*Edit: So yea… I actually had a chunk where the binary contents was completely different to the original storage where I copied it from. Yet the header had the ‘duplicacy<NUL>’ string at the beginning, so it wasn’t totally corrupted. Had to re-copy the chunk manually. Very puzzled by that!

bitrot · 31 May 2020 08:43

TheBestPessimist:

How to use Google Drive File Stream with

Since gdfs is a normal system drive, you init the storage as a Local Disk
duplicacy init -e my-repository-gdfs "G:\My Drive\backups\duplicacy"
and then all the commands are run normally.

I get “Failed to list the directory: open G:/: Access is denied.” when trying to do this. Is it because I have Duplicacy Web 1.3.0 running as a Windows service?

TheBestPessimist · 31 May 2020 10:27

You cannot create folders directly in G, you must create your folder (the storage) at least in G:\My Drive\.
To test this, try in explorer to create a folder directly in G: it should give you an error.

bitrot · 31 May 2020 22:40

Sorry, I wasn’t clear. I already have a folder in my Google drive. The problem is I cannot browse to it from Duplicacy Web because I get the error I mentioned when trying to browse to the existing folder.

TheBestPessimist · 1 June 2020 07:51

Since you have running as a windows service, it could be that it’s running when you’re not logged in (note: i do not know if works w/o user login).
Afair, GDFS only runs when a user is logged in, so it could be that is trying to access that folder/drive when it doesn’t exist.

Are backups working while logged in?

@gchen do you have any ideas here?

bitrot · 3 June 2020 00:09

Sitting at a computer now so I can give more context. I’m running Duplicacy web edition on a a couple of windows computers and a Linux server. I’ve got backups running with B2, Google Drive and local storage backup repositories. Backups are running without problems whether I’m logged in or not.

After reading this post, I became interested in testing whether switching from the “Google Drive” back end to the “Google Drive File Stream” method discussed above would make sense for me. I already have all the pieces in place so I thought trying this would be relatively simple. But I don’t get very far.
I’ve created a folder in my Google Drive where I want to store the backups. On my system this folder is at G:\My Drive\DuplicacyBackups\

But I’m not getting anywhere. On the Duplicacy web interface, when I try to initialize the storage i get the earlier mentioned error.

Click on “Storage” tab
Click on the “Folder” icon to browse the local filesystem to select the local directory to be used as the storage
Click on the folder for the G: drive to select a subfolder
Failed to list the directory: open G:/: Access is denied.

In other words this error happens when logged in and just browsing to the directory via the web interface. I do not have any issues browsing to that folder in Windows Explorer

TheBestPessimist · 3 June 2020 03:40

Thank you for the much more detailed explanation. From what you’re describing, I believe this is a web-ui version limitation, maybe in the way it tries to list directory contents, since G:/ is not accessible for writing.

I do have a workaround, if you want to try that (not sure if it really works, but might be worth a try).

Create the same storage folder on C:/ C:\My Drive\DuplicacyBackups\ and init the storage there.
Copy the contents of that folder into the Google drive folder
Change the storage path in 's configuration folder.

I believe by doing these steps, you work around the folder access limitation from above.

bitrot · 3 June 2020 21:38

I’ve tried this as per your instructions. After changing the storage path I’ve created a check job and this fails when it runs.

Running check command from C:\ProgramData/.duplicacy-web/repositories/localhost/all
Options: [-log check -storagename -threads 7 -a -tabular]
2020-06-03 17:33:36.387 INFO STORAGE_SET Storage set to G:/My Drive/DuplicacyBackups
2020-06-03 17:33:36.388 ERROR STORAGE_CREATE Failed to load the file storage at G:/My Drive/DuplicacyBackups: CreateFile G:/My Drive/DuplicacyBackups: Access is denied.
Failed to load the file storage at G:/My Drive/DuplicacyBackups: CreateFile G:/My Drive/DuplicacyBackups: Access is denied.

TheBestPessimist · 4 June 2020 07:40

Can you please try the check/backup using duplicacy CLI (with same repository, and/or a new one)? I want to see if that works. If we get the same access errors, then i honestly have no idea how to help you.

MikeBrant · 27 November 2020 15:02

Hello,

I mounted a GDFS drive on OS X, added the directory through the Web Edition and started an initial backup. It’s still super slow (19MB/s) and the gdfs used cache is tiny. Am I doing something wrong? Thanks!!

TheBestPessimist · 26 December 2020 06:30

@MikeBrant and @bitrot

do you have any updates here, has anything changed?

MikeBrant · 26 December 2020 07:50

Hello, I just had to be very very patient and it worked in the end!

TyTro · 3 January 2021 04:37

This sounds like a very interesting way to use Duplicacy with Google Drive!

But what I am wondering about, doesn’t this make the backup way less secure, as any software on your PC could basically just delete files from the backup?

One of the main things a backup is supposed to protect against for example is Ransomware attacks, so malicious software that encrypts every file it can find on your PC. If your backup is a regular partition that software has write access to, then ransomware would also encrypt your Duplicacy Backup files, and then the Backup would be quite useless…

Am I missing something, or is this " Google Drive File Stream" method just way less useful as a backup? In that case, I don’t think it’s worth the potential speed improvement over directly using the Google Drive web-api.

saspus · 3 January 2021 05:00

I have another concern with this approach: duplicacy completing backups now does not mean that the data has reached the cloud. And when will they happen — is not deterministic.

I other words if my Mac burns in flames after duplicacy just completed backup I will lose data.

The ransomware concern is valid too; however google has mechanism to revert your data in bulk in these scenarios, so I would not worry much about it. User error is more of a danger — and duplicacy datastore sticking in the visible drive folder is asking for trouble. That was the reason I asked to support scope for GCd and it was implemented already in a pull request but not yet merged.

The most fool proof approach would be to generate set of credentials that can only backup but not delete or modify and have duplicacy use that. This approach with B2 was discussed here recently.

Creeju · 12 November 2022 20:10

The uploads not being necessarily done when Duplicacy finishes is not a concern, at least in my opinion. If Duplicacy was waiting for the uploads being finished, it would still be backing up during the fire. The only difference is a non-functional revision, which does not really pose a problem. Just restore the one before that. (Of course one should keep this possibility in mind, when using the storage endpoint for multiple repositories)

About moving Duplicacy to application data in Drive: I value the option for less technical people, but it makes data access vastly harder. Backupped data is still my data and I like to have the option to move it or duplicate it.

wim-olbright · 1 April 2023 00:42

Hi, thanks for meaningful insights. I just noticed that gdfs looks quite similar to rclone mount with --vfs-cache flag. Did someone tried to use it in the way described here? Is there any significant difference in usability?