Backup Immutability - Object Lock support?

dgcom · 9 October 2020 03:25

I saw an announcement today that Backblaze implemented Data Immutability with Veeam supporting it right away… Digging deeper - looks like B2’s S3 API now support some Object Lock API:

I don’t think it is possible to use this as is with Duplicacy - it modifies some existing files after each backup - so it is not possible to protect entire bucket…
But this is a great feature and it would be very good to have this implemented in the code, if possible…

The fact that it is possible to delete/corrupt existing backup always tuned me off from usual back ends.
So far I know only two solutions which can help - one is iDrive - they can restore deleted files on request and rsync.net - they have immutable snapshots.

But it would be great if Duplicacy could create immutable backups directly - it would be then possible to use either immutable attribute in Linux or manually set object lock in B2…

I think this was discussed before, but I can’t find it for some reason.

saspus · 9 October 2020 04:05

Excellent point. My understanding was that Duplicacy never modified any files on the target except config file; snapshots and chunks are only being created, deleted (during prune) and renamed (during backup and prune); but never modified.

I’m pretty sure it’s already possible to configure the bucket permissions to only allow those operations and disallow modify which will effectively keep all chunks and snapshots immutable.

I’m not sure how important is config file and how often does it change. Maybe I’m misremembering this.

dgcom · 9 October 2020 05:34

If you keep permission to delete files, you will not have any protection - when ransomware strikes, people who launch attack do appropriate work to find all remote backups and delete them. I speak from experience in recovering friend’s business from such attack… Attackers discovered backup software and deleted all files. Luckily, backup vendor was able to restore from backup.

If object is locked, delete will fail - In this case Duplicacy would need to account for this during prune operation and only delete files when lock attribute expires. This may become very complex, unfortunately.

I played around with locking destination folder with immutable attribute on Linux - but it broke the backup:
I wanted to set things up so the client only does backup and prune is done on sever after temporary resetting immutable attribute for the duration of the operation.

Then I discovered rsync.net - the price is not too bad and free 7 daily immutable snapshots are a great option.

I also emailed Backblaze about recovering files in case of rogue action - and was told that no such option exists. Looks like they finally moving in the right direction with object lock support…

myyj · 14 October 2020 17:01

For this to work in duplicacy, basically you would have 3 issues to deal with:

#1
Any duplicacy config files that get changed during every backup. This could be fixed by using a new config file copy every time, and deleting the old config files after the lock expires.

For example, instead of using config.data (or whatever), you would use
config.data.00001
then
config.data.00002

It should always copy the highest numbered file to a new one. The old ones can be deleted out as their locks expire.

#2
Pruning / deleting files from the backup. Just don’t set any pruning for less than however long the lock period is. (Or don’t worry about this issue at all, and silently ignore lock errors when deleting, knowing that eventually one day, the delete will work. Every prune will try to delete these, and eventually it will work.)

#3
Keep old files locked somehow. Since duplicacy reuses existing chunks forever, you don’t know how long to lock them for. So, after every prune, it would need to relock any files that are not locked (but ONLY files that are not locked). This would be tricky. Some files would be expire the lock for some time before the next prune to relock them and would be vulnerable during that time. There would instead need to be something indicate that a file needs to be allowed to expire the lock, and otherwise continually renew the lock.

Interestingly, B2 also has “Legal Holds” now in addition to the new locks. I’m not sure how the legal holds work within b2, and they don’t seem to be documented anywhere yet, but this could potentially be used as a different kind of locking mechanism we could use.

dgcom · 14 October 2020 19:32

Interesting note on Legal Holds… I might send their tech support a question on this.

On the other points - yes, it won’t be simple to implement, but should be possible.
I wonder if prune process can extend lock every run if it knows that chunk is needed and then mark somewhere chunks which can be deleted on the next run, when lock expires.
This will require regular prune run with the period shorter than the lock…

myyj · 14 October 2020 19:57

I just realized that the config file (issue #1) wouldn’t be an issue at all . Just don’t lock that file.

So, really, the only issue is #3. Relocking (extending the lock) on all files after each prune, and also like you said, noting the files in another config file that need to be deleted on the next run.

That could be transaction costly through (on b2), having to access through every single chunk, every single week (or however often), and then updating the lock on them.

myyj · 14 October 2020 20:09

Or, just thinking here, what if instead of all of that, duplicacy use the b2 “hide” instead of “delete” command. B2 has a feature to automatically delete hidden files after xx days. (Or is hide already being used elsewhere in the duplicacy logic). Remove delete from your key (and any other unneeded functions).

If hide isn’t being used already by the logic, that would be a very simple code change.

Then, the only new feature that would need to be implemented would be a “fix after messing up” if a virus went in and “deleted” everything (and thus made it all hidden) … we would need some way to know what to unhide, programitically. Make some immutable file that is created after each run with a list of what chunks exist. Then, just run that file against an unhide script to “fix” or reverse the virus damage.

tranceFusion · 3 March 2021 14:11

It does seem to me that duplicacy should be using hide instead of delete in the b2 client. This would allow us to use any retention policy we wanted via the bucket lifecycle rules. The app key could be issued without deleteFiles capability as hiding uses writeFiles.

dgcom · 3 March 2021 19:12

I don’t think storage retention policy will help with backup snapshot (version) retention - they are different things.
And “hiding” files won’t help protecting them - they are actually not really hidden from API and can be deleted - even with “write-only” API key - operations b2_list_file_versions and b2_delete_file_version are allowed.
The only way you can protect yourself is to use storage which includes read-only snapshots.

tranceFusion · 3 March 2021 19:47

I’m assuming your goal aligns with mine which is to keep an attacker that has compromised the machine from also deleting the remote backup using the API key present on the machine.

In b2 the possible ways to remove a file from a bucket are b2_hide_file and b2_delete_file_version. “Hiding” is equivalent to a soft delete. The only other option is to delete by version which is what duplicacy is doing. In b2, “hiding” functions very similarly to the “delete” in the AWS S3 api. In other words, it is supposed to be used for typical file deletion operations. Indeed if you look at the code for another backup tool, duplicity, it is using hide rather than delete. The lifecycle policy will decide when the file is permanently deleted (possibly immediately).

B2 hiding has a different privilege requirement than deleting. When you create the key, you can give it writeFiles capability which will allow b2_hide_file (which again, might result in the file being immediately deleted depending on bucket lifecycle policy), but not give it deleteFiles capability needed to use b2_delete_fil_version.

https://www.backblaze.com/b2/docs/b2_create_key.html
https://www.backblaze.com/b2/docs/b2_hide_file.html

Now it could be argued that this isn’t an ideal solution as you can’t easily restore to a single point in time with b2 - you would have to write a script that would go through and delete any versions newer than X point in time… and I would agree. But still, this is far more attractive than losing your entire dataset.

dgcom · 3 March 2021 21:25

My goal is to protect backup from being targeted by someone trying to wipe it out or corrupt.

Don’t mix Duplicacy version and B2 file version - these have nothing in common

By “hiding” the file on B2 you are not protecting it - it remains visible for b2_list_file_versions call - so the adversary, who wants to delete your backup just pulls all files using that call and then uses b2_delete_file_version to delete them all - which is unrecoverable operation.
All is because the “write-only” API key has write and delete capabilities.

"Hiding’ file is more like a “cosmetic” operation in B2 - it just removes file from “normal” view.
This is why Backblaze had to come up with “object lock support” referenced in my original post.

Object lock can help, but it is not really compatible with Duplicacy…

tranceFusion · 3 March 2021 21:36

I am not mixing them and this is not accurate. It is possible to create a key that can b2_hide_file but not b2_delete_file_version. Please see the documentation I linked above.

Hiding is not a cosmetic operation if bucket lifecycle policy is set to do anything other than keep all versions perpetually as the file will eventually be permanently deleted. Hiding is a poorly named soft delete. b2_list_file_names will not return these files which is already what duplicacy is using to get a file listing.

dgcom · 3 March 2021 23:12

No, you cannot do that, unfortunately. When you create a key, you only have 3 options:

Read and Write: most capabilities
Read Only: no write/change capabilites
Write Only - capabilities: deleteFiles, listBuckets, writeFiles

b2_hide_file needs writeFiles and b2_delete_file_version needs deleteFiles - both of which are given to “Write Only” API key.

I discussed this with B2 support some time before they announced object lock and they confirmed that there is no way to secure bucket from such attack.

tranceFusion · 3 March 2021 23:26

I see the confusion. These are the only options exposed by the b2 web UI. However, you can create a key with specific privileges using their API:

capabilities
required
A list of strings, each one naming a capability the new key should have. Possibilities are: listKeys, writeKeys, deleteKeys, listBuckets, writeBuckets, deleteBuckets, listFiles, readFiles, shareFiles, writeFiles, and deleteFiles.

dgcom · 4 March 2021 00:59

Ok, than this is something new - as I said, my conversation with their support ended up with confirming inability to protect files when write permission was given.
Have you tried creating API key with only writeFiles capability? If that is possible, then it should be possible to use Duplicacy and protect backup from being targeted.

towerbr · 4 March 2021 01:37

You have to use B2 CLI tool:

b2 authorize-account
b2 create-key --bucket [bucket-id] [backup-key-name] listBuckets,listFiles,readFiles,writeFiles

dgcom · 4 March 2021 03:17

Or just call an API.
B2 tool does not like restricted API keys, so you can’t upload:

 ConsoleTool cannot work with a bucket-restricted key and no listBuckets capability
 ERROR: application key has no listBuckets capability, which is required for the b2 command-line tool

I already tested this with small C# snippet and it seems to work, maybe useful for some other project I have.

I think with this it should be possible to get Duplicacy to create sort of write-only, no delete backups, although I am not sure if Duplicacy relies on listFiles…
And you’ll still need to be able to delete files somehow - possibly running cleanup manually with different API key.

towerbr · 4 March 2021 11:04

I didn’t understand your point, the above commands with the CLI tool are calling B2 API.

I use two keys, one for backup and one for prune.

The “backup” key has the permissions listBuckets,listFiles,readFiles,writeFiles

And the “prune” key the permissions listBuckets,listFiles,readFiles,writeFiles,deleteFiles

The particularity in my case is that I only execute prune manually (and rarely). The “prune key” is encrypted with GPG and I provide the password at run time.

I think the only vulnerability in my case is if a ransonware was able to install a keylogger and capture the password I type for the prune key when I use it.

dullage · 4 March 2021 18:00

I do exactly the same thing and it works well. The only other thing is to ensure the bucket Life Cycle settings keep prior versions (at least for a while) as writeFiles allows files to be overwritten (if you only keep the latest version). That said, I think this is a requirement for fossilisation anyway.

Should an attacker choose to mess up all the chucks at least I’d have a copy. Albeit some DIY scripting would be needed to restore a backup from previous versions of chucks.

dgcom · 4 March 2021 18:18

This may work if you can manually run prune regularly - not really a good option in every case. I still think read only snapshots on the storage side are better solution.
I also prefer solution where Duplicacy creates local backup (on NAS or removable drive) and then separate job (on central NAS) pushes data to cloud storage using rclone.
Looks like rclone also now supports “soft delete”, I need to figure out what would be a minimal set of key capabilities for rclone to properly mirror Duplicacy backup to B2, including prune and let life cycle to take care of “soft deleted” files.