New feature: RSA encryption

This feature is available in the master branch and will be included in the next release.

Initialization

To initialize a new encrypted storage with the RSA encryption enabled, run the following command:


$ duplicacy init -e -key public.pem repository_id storage_url

The RSA encryption can be only enabled if the storage is encrypted (by the -e option).

The RSA public key, along with other configuration parameters, will be stored in the file named config which is then uploaded to the storage.

You can verify if the RSA encryption is turned on by running the info command in the following way:


$ duplicacy -d info storage_url

...

RSA public key: -----BEGIN PUBLIC KEY-----

...

-----END PUBLIC KEY-----

Backup and Restore

No extra option is needed when you run the backup command. You’ll see a log message that says RSA encryption is enabled.


$ duplicacy backup

Storage set to ...

RSA encryption is enabled

...

Note that when the RSA encryption is enabled, only file contents are encrypted by the RSA encryption. File metadata, such as modification times, permissions, and extended attributes are not protected by the RSA encryption (but still protected by the storage password).

To restore you’ll need the RSA private key:


$ duplicacy restore -r 1 -key private.pem

Other commands

Other commands that take the RSA private key are list, check, cat, diff, and copy.

For the check command, you’ll only need the RSA private key with the -files option, which is used to verify the integrity of every file.

You can run the check and prune commands without the RSA private key to manage backups encrypted with the RSA public key.

Copy with RSA encryption

If you want to switch to the RSA encryption for an existing storage, you can create a new encrypted storage with the RSA encryption enabled and then copy existing backups to the new storage:


duplicacy add -e -key public.pem -copy default new_storage_name repository_id new_storage_url

duplicacy copy -from default -to new_storage_name

Vice versa, you can copy from an RSA encrypted storage to a new storage without RSA encryption:


duplicacy add -e -copy default new_storage_name repository_id new_storage_url

duplicacy copy -key private.pem -from default -to new_storage_name

How it works

The RSA encryption is performed on the chunk level. Previously, an encrypted chunk always starts with the header duplicacy\000, followed by the nonce and encrypted chunk content:


-----------------------------------------------

duplicacy\000 | nonce | encrypted chunk content

-----------------------------------------------

Note that the key used to encrypt the chunk content isn’t stored here. Rather, that key is derived from the hash of the chunk content.

Chunks with the RSA encryption enabled will start with a new header duplicacy\002. The key to encrypt the chunk content is no longer derived from the hash of the chunk content. Instead, the key is randomly generated (unique to each chunk), and then encrypted by the RSA public key, and stored after the chunk header:


-------------------------------------------------------------------

duplicacy\002 | RSA encrypted key | nonce | encrypted chunk content

-------------------------------------------------------------------

To decrypt such a chunk, Duplicacy will first recover the key from the RSA encrypted key (which requires the RSA private key), and then use that key to decrypt the chunk content.

RSA encryption only applies to file chunks, not metadata chunks. Therefore, the file names, timestamps, permissions, attributes, etc are not protected by the RSA public key (but still protected by the storage password).

Key generation

You can run these commands to generate the private and public key pair:

openssl genrsa -des -out private.pem 2048
openssl rsa -in private.pem  -pubout -out public.pem
3 Likes

Great! I’m here thinking of some ways to use this new feature for remotely managed backups … :wink:

I imagine this will re-encrypt the config file, right?

No, it doesn’t touch the config file. The encryption is done at the chunk level. Previously an encrypted chunk has the following format:

----------------------------------
duplicacy\000 | nonce | ciphertext
----------------------------------

Note that the key to decrypt the ciphertext isn’t stored. That is because the key is derived from the hash of the chunk content.

For RSA encryption, we introduce a different header, generate a random key for each chunk, encrypt the random key with the RSA public key, and place the encrypted key next to the header.

------------------------------------------------------
duplicacy\002 | RSA encrypted key | nonce | ciphertext
------------------------------------------------------

RSA encryption only applies to file chunks, not metadata chunks. Therefore, the file names, timestamps, permissions, attributes, etc are not protected by the RSA public key (but still protected by the storage password).

So all the fle chunks will have to be reuploaded?

No, existing chunks are not re-encrypted and reuploaded. Which means only new backups are protected by the RSA encryption. You can restore old backups without the private key.

I think I didn’t understand…

I have a storage with N revisions and thousands of chunks. What will happen exactly if I start backing up with the -key parameter (after I create the keys, of course)? Will only “new” chunks be encrypted in the new way?

Correct, only new backups are protected by the RSA encryption. The word “new” was missing from the previous post so it was confusing…

Ok. So if only a few files are changed in the repository, some new chunks will be created with RSA encryption.

But this latest/new revision will then have old chunks created with the previous form of encryption and new chunks created with RSA encryption. How would restore work? Would the -key parameter be used even with the presence of the old chunks encrypted with the previous form?

Using the -key feature, can we finally have that (i think there’s a topic about it) usecase where multiple untrusted parties backup to the same storage, and someone can run check -all without “reading” everybody else’s data? (but still making sure that all backups have no missing chunks).

If this is the case, then it’s quite a good feature!

It was exactly this use case that I thought of when I mentioned “remotely managed backups”.

Yes, with this feature you can check the backups for someone without reading their data. However, you don’t want multiple persons to back up to the same storage using different keys, because otherwise if someone else has already uploaded a file chunk that you happen to need, then you won’t be able to read that chunk.

I don’t understand something about the feature. I will write below multiple use-cases and please explain/comment/approve/disapprove of my beliefs.

My use-case is the same as above: multiple untrusted parties backup to the same storage.

1. The chunk name remains unchanged (same as if not using the -key option?) and only the file content is changed (encrypted with the public key)

If that’s the case, then it doesn’t make any sense to have multiple untrusted parties backup to the same storage (untrusted implies each party using different keys) because there seems to be a _ false deduplication_ as only the original uploader of a chunk will be able to restore it, rest of the parties getting an error during the decryption process.

This possible use-case leads to broken/useless backups for everyone except the uploader of a chunk.

If multiple backups are happening at the same time it’s easy to see the following problematic case:

Over the whole backup we have
party 1 = p1
party 2 = p2
party 1 will need to upload at some point during the backup both chunk 1 (c1) and chunk 2 (c2)
party 2 will need to upload at some point during the backup both chunk 1 (c1) and chunk 2 (c2)

  • p1 uploads c1 (encrypted with k_p1)
  • p2 needs to upload c1 but finds it existing on the storage
  • p2 uploads c2 (encrypted with k_p2)
  • p1 needs to upload c2 but find it existing on the storage

In this case both snapshots are un-restore-able since 1 of the chunks needed in the backup were uploaded by the different key.

2. The chunk name is changed (as if the contents of the chunk are different) but the unecrypted contents are actually the same

In this case we have all the problems presented above along with an extra: there won’t be any deduplication.


Are there any other use-cases that i missed?


Therefore @gchen, i think this feature needs to be advertised with care and make sure that whoever uses it does not use different keys in the same storage (or respectively same copied -bit-identical storage).
From what i understand, the sole purpose of this feature is to have an extra layer of security of the backup in case we don’t trust the storage provider (but we trust all other parties who are doing backups to the same storage as I do).

I pushed a new commit to that PR that includes a few changes. First, the public key is now stored in the config file. This is mostly to ensure that only one key can be used so you can’t mess up the storage with multiple keys. Also, the copy command should now work with RSA encryption.

1 Like

you mean, in the storage config file?

(side topic: what about my questions/remarks from post above?)

Yes, in the storage config file, so you’ll need to provide the public key when initializing a new storage.

The chunk name isn’t changed by the RSA encryption. It is still derived from the hash of the chunk content.

Great!

Bumping my doubt:

The header (duplicacy\000 or duplicacy\002) tells if the chunk is encrypted with RSA or not, so you could have mixed encryption in one storage. However, this is not possible now after the change to store the public key in the config file.

1 Like

Thanks! So, in order to use RSA encryption, I must create a new storage and copy (:d: command) the contents of the old/current storage that uses the previous encryption, right?

Edit: I just noticed the edit in the original post…

Yes that is correct…

Any reason for this PR doesn’t be merged?