New feature: RSA encryption

I think I didn’t understand…

I have a storage with N revisions and thousands of chunks. What will happen exactly if I start backing up with the -key parameter (after I create the keys, of course)? Will only “new” chunks be encrypted in the new way?

Correct, only new backups are protected by the RSA encryption. The word “new” was missing from the previous post so it was confusing…

Ok. So if only a few files are changed in the repository, some new chunks will be created with RSA encryption.

But this latest/new revision will then have old chunks created with the previous form of encryption and new chunks created with RSA encryption. How would restore work? Would the -key parameter be used even with the presence of the old chunks encrypted with the previous form?

Using the -key feature, can we finally have that (i think there’s a topic about it) usecase where multiple untrusted parties backup to the same storage, and someone can run check -all without “reading” everybody else’s data? (but still making sure that all backups have no missing chunks).

If this is the case, then it’s quite a good feature!

It was exactly this use case that I thought of when I mentioned “remotely managed backups”.

Yes, with this feature you can check the backups for someone without reading their data. However, you don’t want multiple persons to back up to the same storage using different keys, because otherwise if someone else has already uploaded a file chunk that you happen to need, then you won’t be able to read that chunk.

I don’t understand something about the feature. I will write below multiple use-cases and please explain/comment/approve/disapprove of my beliefs.

My use-case is the same as above: multiple untrusted parties backup to the same storage.

1. The chunk name remains unchanged (same as if not using the -key option?) and only the file content is changed (encrypted with the public key)

If that’s the case, then it doesn’t make any sense to have multiple untrusted parties backup to the same storage (untrusted implies each party using different keys) because there seems to be a _ false deduplication_ as only the original uploader of a chunk will be able to restore it, rest of the parties getting an error during the decryption process.

This possible use-case leads to broken/useless backups for everyone except the uploader of a chunk.

If multiple backups are happening at the same time it’s easy to see the following problematic case:

Over the whole backup we have
party 1 = p1
party 2 = p2
party 1 will need to upload at some point during the backup both chunk 1 (c1) and chunk 2 (c2)
party 2 will need to upload at some point during the backup both chunk 1 (c1) and chunk 2 (c2)

  • p1 uploads c1 (encrypted with k_p1)
  • p2 needs to upload c1 but finds it existing on the storage
  • p2 uploads c2 (encrypted with k_p2)
  • p1 needs to upload c2 but find it existing on the storage

In this case both snapshots are un-restore-able since 1 of the chunks needed in the backup were uploaded by the different key.

2. The chunk name is changed (as if the contents of the chunk are different) but the unecrypted contents are actually the same

In this case we have all the problems presented above along with an extra: there won’t be any deduplication.


Are there any other use-cases that i missed?


Therefore @gchen, i think this feature needs to be advertised with care and make sure that whoever uses it does not use different keys in the same storage (or respectively same copied -bit-identical storage).
From what i understand, the sole purpose of this feature is to have an extra layer of security of the backup in case we don’t trust the storage provider (but we trust all other parties who are doing backups to the same storage as I do).

I pushed a new commit to that PR that includes a few changes. First, the public key is now stored in the config file. This is mostly to ensure that only one key can be used so you can’t mess up the storage with multiple keys. Also, the copy command should now work with RSA encryption.

1 Like

you mean, in the storage config file?

(side topic: what about my questions/remarks from post above?)

Yes, in the storage config file, so you’ll need to provide the public key when initializing a new storage.

The chunk name isn’t changed by the RSA encryption. It is still derived from the hash of the chunk content.

Great!

Bumping my doubt:

The header (duplicacy\000 or duplicacy\002) tells if the chunk is encrypted with RSA or not, so you could have mixed encryption in one storage. However, this is not possible now after the change to store the public key in the config file.

1 Like

Thanks! So, in order to use RSA encryption, I must create a new storage and copy (:d: command) the contents of the old/current storage that uses the previous encryption, right?

Edit: I just noticed the edit in the original post…

Yes that is correct…

Any reason for this PR doesn’t be merged?

So that other people may review it, or comment on it like i did here, maybe

1 Like

Right, I wanted to give it more time so others can review or try it out.

2 Likes

Bump… :wink:

Very interresting but I don’t think I understood completely how this should work!

On my server, I can give UserX access to a folder using SFTP. (No other users on that folder)
UserX can now create an RSA encrypted backup with a public key and can restore or check with the private key (and storage password). User X can use the private key for both restore and check.

Doesn’t this mean that I (server owner) cannot check chunks on their behalf without receiving the private key and storage password, therefore also access to the content?

On the other hand, If I create a public/private key pair (+ storagePwd) for each storage(user), allow them to backup to me with their public key and keep the private to myself, then I could probably check chunks on the server but only I would have access to the data, and the users would not be able to restore… (good for IOT scenarios perhaps!)

Sorry if I’m complicating things, I’m probably mising something obvious here!
(not an expert in the field!)

You always need the storage password to access anything. Without the private key, you can still run the check command to check if all chunks exist. You can’t run the check command with the -files option though, because file contents are encrypted by the public key, while metadata are not.

You can also run the prune command without the private key.

This is an interesting use case. You can then provide a restore service for your users from one of your servers. But again, if your only purpose is to check the existence of all chunks then you don’t need this setup.

2 Likes