Does bit-identical copy only work for unencrypted copy?

analytical · 24 January 2025 09:04

I have spent considerable time trying to validate the bit-identical copy function using Duplicacy CLI. My tests show this works only for unencrypted copy. For encrypted copy, the chunks are not copied identically – they have the same file names, but different contents. If I am doing something wrong, can someone please offer the sequence of code to doing this right? Thank you.

analytical · 24 January 2025 21:00

OK. So apparently the -bit-identical switch only creates files with identical name and size, not identical data within.

Ref: https://github.com/gilbertchen/duplicacy/wiki/add#-bit-identical

Blockquote
“-bit-identical
The -bit-identical option is used along with the -copy option and will copy the IDKey, ChunkKey and FileKey to the new storage from the old one. In this case the names of the chunks generated by Duplicacy during backup will be identical in the source and new storage.”

So only the names of the chunks are programmed to be identical, not the data within.

Blockquote
"This has the effect that you can rsync or rclone the chunks folder for example from local (source) to Google Drive (new storage), and then only do backups on Google Drive, and the existing chunks will be identical (same name, same size) as if the backup was run locally.

So only chunks are programmed to be the same size, but not programmed to be identical.

I wish this was stated more plainly as I spent quite a bit of time debugging this. I thought the bit-identical copy implied that it eliminated the need for other copy programs such as rclone and rsync. But apparently they are still needed. So why the need for bit-identical when the use of rclone and rsync will guarantee any data you copy is bit-identical?

Thank you.

Droolio · 25 January 2025 03:41

While the binary data in the chunk may not be identical, it doesn’t matter as the unpacked data within is, and the chunks are still interchangeable.

There seems to be some confusion here…

-bit-identical doesn’t ‘eliminate a need’ for these third-party tools - it merely facilitates being able to use them alongside Duplicacy at the same time. If you’re only using Duplicacy copy to replicate a storage, you don’t need it. If you’re only using Rclone/rsync, you don’t need it. If you’re using both, you do.

The extra flag generates chunks that can be swapped with either method at any time - they don’t have to be binary identical to achieve that goal. Without the switch, Duplicacy is blind when two copy-compatible storages are mingled using Rclone/rsync because, as you say, the filenames are different, and also the encryption keys are somewhat different too.

analytical · 25 January 2025 19:17

Thank you very much for your time and help, Droolio.

So here is my problem. I presumed encrypted file contents would also be bit-identical, not just the file names. This would allow for easy integrity monitoring with simple file signature compare and data recovery from data rot since I would have a bit-identical backup.

Being that I presumed wrongly, what are the methods used for integrity monitoring and recovery from data rot?

Thank you.

saspus · 25 January 2025 20:16

Integrity monitoring and rot healing is a job of a filesystem. So, use zfs or btrfs and run periodic scrub of the array that holds duplicacy data.

You can run check -chunks in duplicacy after each backup to ensure the recently uploaded chunks are consistent, but after that it’s a job of a filesystem to maintain data integrity.

That said, you can still do it. Delete verified_chunks file and run check -chunks. It will check all chunks.

If corrupted ones are found — replace them with the ones named the same from another storage.

This is however should only be used as a stop-gap measure until you migrate to zfs or btrfs on your storage server, as it’s an inefficient and labor intensive process to do it this way.

analytical · 26 January 2025 08:11

Thank you very much for your time and detailed help, saspus.

You’ve given me much to consider and test so I just wanted to promptly acknowledge and thank you before going off and doing my testing.

Thank you!

datao · 26 January 2025 12:58

This is a little off-topic: but what’s the correct procedure if I happen to copy my finished storage repository from google drive to local with rclone (or any third party program) and then want to initiate the local storage with web-GUI? Do I need to manually go into the config file as well and add bit-identical? To achieve what Droolio mentioned here:

-bit-identical doesn’t ‘eliminate a need’ for these third-party tools - it merely facilitates being able to use them alongside Duplicacy at the same time. If you’re only using Duplicacy copy to replicate a storage, you don’t need it. If you’re only using Rclone/rsync, you don’t need it. If you’re using both, you do.

If I wanted the option of using aforementioned both.

saspus · 27 January 2025 16:26

Remove old storage. Add new one with the same name. (The bitidenticalness affects only the config file on the storage which you already copied as part of copying the whole dataset)

datao · 27 January 2025 16:57

In this case I also want to keep the old storage, does it matter that I give the new storage a new name? They will just continue to be identical as long as I backup into both of them from the same source data right?

saspus · 27 January 2025 17:05

The name is only used in the context of web ui, for schedules to refer to that storage. It’s not used for anything else.

Yes.

87aa841a6fec03dd1ad6 · 31 January 2025 14:17

I have a similar setup to what you’re proposing: 2 encrypted, bit-identical storages and I’ve used the method above to replace corrupted chunks the handful of times it’s happened (I run check -chunks on a weekly schedule.)

The only way to keep the storages in sync AFAIK is backup to one and copy (from that backup) to the other.