How Hash collisions are handled?




To my understanding Duplicacy relies solely on size of the chunk and SHA-256 hash for determining whether two chunks are the same or not? I know that I am splitting hairs here, but in theory it is possible that two chunks have the same size and hash, right. I am thinking here HUGE backups meant to stored for tens of years. Eventually there might be a hash collision.

Does Duplicacy algorithm somehow detect collision and resort to byte-by-byte comparison in case the size and hash match? If not, are there plans to take a longer hash into use? The computational load might not be any higher.



Hash collisions will be detected during restore or check -files. This is because we keep file hashes in the snapshot so an incorrect chunk will very likely cause a different file hash. There is no way to detect hash collisions during the backup command when uploading chunks – there doesn’t seem to be an efficient way to do that other than using a longer hash.

Currently there is no plan to use a longer hash, although it is not hard to implement.