Should we use -hash argument when backing up every time or timestamp? What are the benefits of each please?
Maybe some best practices on how one should backup would be good as there are a lot of options which is great but it’s not clear on advantages of different options.
With the -hash argument Duplicacy will need to read the entire content for each file, and this can be very slow for a large directory. That is why it is not enabled by default.
Without the -hash argument, Duplicacy will only back up files with different timestamps from last backup. This may be a problem if there is some file whose content has been modified but the modified time somehow remains unchanged. Another less obvious problem without the -hash argument is that it may leave ‘holes’ in chunks for small files, because of the pack-and-split approach. If a file is smaller than the average chunk size, it is likely to be saved in a chunk together with other files. When only one file in a chunk has been modified and repacked into a new chunk, the old chunk is still needed for other files. This tends to waste storage space and increase restore time.
Interesting, so -hash will cause the holes to be cleaned up? Does that mean that if I change many small files and run a backup, passing -hash will cause me to upload many more bytes than without the flag, since chunks will change?
I assume -hash doesn’t significantly affect downloaded bytes (the ones that get billed at an expensive rate)?
Right, -hash will upload more bytes but backups created with -hash will be more compact resulting in fewer downloaded bytes.
Is there a preference when you’re dealing with larger files (in the 2-12GB range) that change rarely, with a total volume of 10-20TB?
Can we avoid this waste of storage space without having to hash all files?