Google Drive, drive.appdata scope, service account impersonate

What I want: duplicacy to target storage on my g-suite account, but:

  1. without polluting My Drive and
  2. without need to renew tokens and depend on duplicacy.com being up to do so
  3. to not depend on duplicacy.com owned google project and inherited limitations, such as api rate.

It seems 1) and 2) is impossible to accomplish today: correct me if I’m wrong.

For 1) the token needs to be created with scope drive.appdata and not drive. Right now the scope is hard-coded server-side. I’d argue this needs to be a default – nobody needs to access duplicacy datastore directly; it’s an opaque container, messing with it can only bring sorrow. It’s literally application data and shall be treated as such.

For 2) OAuth is out of the window – due to expiring and renewable tokens, so I would need to use service account with domain delegation enabled; and for that duplicacy needs to be able to impersonate the service account (see createDelegated) as another domain user. Rclone can do it, (see --drive-impersonate), duplciacy can’t.

Corrections? Other ideas?

Did you try providing a service account json file when the CLI is prompting for a gcd token file? Duplicacy should accept such a file as well. You just need to add the service account email address to the share list of your drive.

If that doesn’t work for you, another option to host your own authorization page on Google Cloud Service (or any website that you own). The code to request the gcd token is pretty simple; I can share it if you take this path.

Yes, that works, but the data goes to the drive of the service account, which is limited to 35GB or so.

Hmm. I would then need to specify path to teh shared folder as seen in the service account in the gcd://path; I haven’t tried that. However this will leave the duplicacy datastore visible in the main account.

This sounds interesting. But I don’t think this will solve the issue – I already can create the service account manually in the console; the problem is creating delegated connection to an actual domain user using that account and as far as I understand this is something that duplicacy itself would doing when creating connection:

see example here:

GoogleCredential credential = GoogleCredential.fromStream(new FileInputStream("MyProject-1234.json"))
    .createScoped(Collections.singleton(SQLAdminScopes.SQLSERVICE_ADMIN))
    .createDelegated("user@example.com");

I maybe grossly misunderstanding how it works – I’m very new to web services/google/etc…

It looks like that to impersonate we just need to populate this Subject field of a jwt.Config object:

The JWTConfigFromJSON function that we use to create a jwt.Config object from a service account json file doesn’t set this field:

So I think a viable option is to read the Subject field again from the file after the call to JWTConfigFromJSON . With this change all you need to do will be to manually edit the service account file and add a subject key.

Awesome!

So, I’ve added this:

diff --git a/duplicacy_gcdstorage.go b/duplicacy_gcdstorage.go
index 85c4c93..052048b 100644
--- a/duplicacy_gcdstorage.go
+++ b/duplicacy_gcdstorage.go
@@ -349,6 +349,9 @@ func CreateGCDStorage(tokenFile string, driveID string, storagePath string, thre
                if err != nil {
                        return nil, err
                }
+               if subject, ok := object["subject"]; ok {
+                       config.Subject = subject.(string)
+               }
                tokenSource = config.TokenSource(ctx)
        } else {
                gcdConfig := &GCDConfig{}```

and added

"subject": "me@saspus.com"

to the service account json file (not forgetting to delegate authority to the service account in the admin console with the right scope) and voila:

% ~/go/bin/duplicacy -d init test gcd://test-visible-folder
Reading the environment variable DUPLICACY_GCD_TOKEN
Enter the path of the Google Drive token file (downloadable from https://duplicacy.com/gcd_start):/Users/me/Downloads/test-visible-duplicacy-70e72858c3bb.json
Reading the environment variable DUPLICACY_GCD_TOKEN
Compression level: 100
Average chunk size: 4194304
Maximum chunk size: 16777216
Minimum chunk size: 1048576
Chunk seed: 6475706c6963616379
Hash key: 6475706c6963616379
ID key: 6475706c6963616379

it worked!! :tada: :fireworks:

(To clarify – this was the test to give service account drive scope and ensure that Duplicacy can create repository in my user account’s drive folder using the service account credentials, and confirm it’s visible. The actual use case would be to use drive.appdata scope – which is much safer)

@gchen, can you please include that change (or the proper implementation of the same idea) in the future release? (it’s too trivial to go through pull request, etc)

Ok, this gets me halfway: The access to drive scope works, but as soon as I change the scope to drive.appdata like so:

In the json:

 "scope": "https://www.googleapis.com/auth/drive.appdata"

In the gcdstorage.go:

	scope := drive.DriveScope;

	if new_scope, ok := object["scope"]; ok {
		scope = new_scope.(string)
	}

	var tokenSource oauth2.TokenSource

	if isServiceAccount {
		config, err := google.JWTConfigFromJSON(description, scope)
		if err != nil {

it all falls apart:

[0] The granted scopes do not give access to all of the requested spaces.;

It seems the problem is duplicacy is trying to still use drive space as opposed to appDataFolder and I don’t immediately see where I can change that.

https://developers.google.com/drive/api/v3/appdata

This seems to be a rather large change to support this (tracking along different folder id, etc – now duplicacy only supports user’s root and team folders); I guess I’ll write a separate clean feature request for this and hope it will get addressed in the future.

In the meantime I’m going to use a workaround with Rclone:

  1. Rclone config:
[gdrive]
type = drive
scope = drive.appfolder
service_account_file = /Users/alex/Downloads/duplicacy-295908-be4a895a5808.json
root_folder_id = appDataFolder
  1. Rclone serve (no vfs caching on purpose, to make the bridge as stateless as practical) can be done in pre-* scripts:
% rclone serve sftp -v --drive-impersonate alex@saspus.com --user test --pass test gdrive:/
  1. Duplicacy init:
% ~/go/bin/duplicacy init test01  sftp://test@localhost:2022/duplicacy
  1. Duplicacy happy:
% ~/go/bin/duplicacy check
Storage set to sftp://test@localhost:2022/duplicacy
Listing all chunks
1 snapshots and 1 revisions
Total chunk size is 994 in 4 chunks
All chunks referenced by snapshot test01 at revision 1 exist

Ugly? yes. Unnecessarily layered? Yes. But works? Also yes. I’ll see how far will this get me.

1 Like

Not sure why the drive.appdata scope didn’t work. I’ll run a test myself.

2 Likes

The scope https://www.googleapis.com/auth/drive.file works for me. I think this is the right scope to use.

Notes for myself and other people who want to make this work:

  • Log in to Google Cloud Platform as the admin of your domain and go to IAM & Admin -> Service Accounts, click Create Service Account.
  • When creating the service account, you must check Enable G Suite Domain-wide Delegation. This checkbox may appear greyed out, but you just need to click the Edit button on the top to make this checkbox clickable.
  • Select Add Key -> Create new key to download the json file.
  • Edit the downloaded json file to add the scope and subject:
  "subject": "user@domain.com",
  "scope": "https://www.googleapis.com/auth/drive.file"
  • Go to admin.google.com, click the menu bar at the top left corner, select Security -> API Controls, then click the Manage domain-wide delegation, and add the client ID of the service account, and the scope https://www.googleapis.com/auth/drive.file or https://www.googleapis.com/auth/drive.

Now you can just provide this modified service account json file when Duplicacy asks for a GCD token file. After that Duplicacy will be able to create the storage in user@domain.com's Google Drive.

@gchen, this

https://www.googleapis.com/auth/drive.file
View and manage Google Drive files and folders that you have opened or created with this app

is not much different than drive scope except instead of giving full access to the entire drive Duplicacy will only have access to the data it created.

This still puts the data in the user’s My Drive folder and it of course works because it uses the same root and layout.

What I’m asking is to make drive.appdata work, so that Duplicacys datastore does not appear in Drive. That way users cannot inadvertently or maliciously modify the datastore. Have a look at rclone source for how they manage appDataFolder.

I assume the necessary changes in the source will be checked in an released shortly?

I finally got it working: Support GCD impersonation via modified service account file by gilbertchen · Pull Request #612 · gilbertchen/duplicacy · GitHub

The appDateFolder not only has its own root id, but also lives in a separate Spaces (appDateFolder instead of drive). Very strange.

1 Like

Perfect!

lives in a separate Spaces ( appDateFolder instead of drive )

This was what I could not wrap my head around – this is key: Spaces(storage.spaces)

storage.service.Files.List().Q(query).Fields("nextPageToken", "files(name, mimeType, id, size)").PageToken(startToken).PageSize(maxCount).Spaces(storage.spaces)

It works now – just initialized the repository:

As a side quirk – not sure why it says “This app no longer has access to Drive” – because the project is alive and well.

Thank you @gchen! I’ll use custom patched version until it gets into the the release.

Edit: I’ve described the process of configuring google account in details with way obscene number of screenshots here Duplicacy backup to Google Drive with Service Account | Trinkets, Odds, and Ends (moved to a new CDN now I can go overboard with pictures). Maybe it will save someone time.

1 Like