Currently many of my devices uses backup command to do offsite backups instead of copy from local storage. Is there a way to already existing offsite backups make them copy compatible without removing existing backup revisions?
I’m not 100% sure but I’d say no there isn’t a way to make them compatible after the fact.
I remember that you need to make the storage copy-compatible when you add it. Afterwards the keys used to hash your chunks (for security and deduplication) will be different between the 2 storages (local and offsite) so you can’t just copy chunks from one to the other.
Okay, in other words my best bet is to create a new bucket (eg. s3) or folder (eg. sftp) (if I want to keep the old backups until they are not relevant any more) and use the add command to that remote bucket/folder?
@gchen can you please comment on this?
If both storages are unencrypted then they should be copy compatible by default. If not, then there isn’t a way to make them copy compatible.
I don’t quite understand how this is possible. Could you please point me to the explanation? (here on the forum, or on github)
Did you mean this part? If the storage is not encrypted, then the
config file will be generated using the default parameters so all unencrypted storages will be copy compatible by default.
Ooooh, so you mean i could connect to one of your storages and i could see all your snapshots??
In other words all unencrypted storages are the same?
Hope I don’t confuse things here, but I found out that not all unencrypted storages are necessarily copy-compatible…
Indeed, I had an idea from this discussion to see if we could use an unencrypted storage as an intermediary. i.e. copy encrypted_store1 -> unencrypted_tmpstore1, copy encrypted_store2 -> unencrypted_tmpstore2, copy unencrypted_tmpstore2 into unencrypted_tmpstore1. It didn’t work.
But not because of encryption! The chunk-seed and hash-key parameters are different (and obvious the chunks themselves, split on different boundaries).
So I ended up with copy-incompatible unencrypted storages, because I had copied them from an existing encrypted storage.
A freshly-made unencrypted storage, however, will have the same chunk-seed, hash-key, and id-key hashes - all three, it appears, using “6475706c6963616379” as the magic number.
Technically, it would be feasible (and quite useful imo) to add functionality to copy backups from any storage. Encryption would be the easy part (it already re-encrypts when copying between copy-compatible storages, unless using
-bit-identical?). You’d have to repack chunks and rewrite snapshot files, but you could theoretically do it on-the-fly, in memory. Furthermore, you could copy between storages of different chunk sizes. It’s just a lot of work to make that happen.
Maybe we should remove the constraint of being copy-compatible for the copy command to work. Technically, it is completely fine when you inject into an existing storage some chunks generated with different size and chunkseed parameters. As long as you can decrypt these chunks and the referencing snapshot files, these foreign backups should be self-contained and readable. The only issue is that you would lose the benefit of deduplication. New backups, even if all files remain the same, will create a completely different set of chunks. However, this shouldn’t be a big problem, because you usually don’t back up directly to a storage that is primarily used as the destination of a copy command (even if you do, you won’t break anything).
When / If you do this, you should add a BIG warning that deduplication will be missing and everything else you noted above.
If you won’t do this people will keep pestering you with questions about why the deduplication isn’t working properly – etc.etc.
Following with interest. It would be very useful to have the feature of being able to copy after the fact, as I’m in that situation now with two storages that are over 10TB with millions of fragments, and some backups for machines which are no longer in my hands (i.e., dead).
I like @Droolio’s idea, which would be to have Duplicacy de-encode the source and re-encode with the destination parameters, on the fly (in memory). Algorithmically, this is some more work, but it would be a great solution for those of us who have multiple/offsite historical backups and want to be able to manage them.
Yes, and perhaps duplicacy could prioritize snapshots using chunks when pruning (i.e. delete those first). That way deduplication will gradually come back as old snapshots disappear.