Can someone help me understand how indexing works?

acosmichippo · 27 February 2025 07:11

I have tried looking through the manual, user guides, searching the forum, etc. and have been unable to find any info about indexing.

tldr; Do edited files (or files deleted and replaced with the same name) need to be re-indexed in a particular way?

I have recently done some compressing of videos (same file names before and after). I expected this to cause a very large delta on the next backup, but after indexing the amount of data in the backup seems relatively small – so small i’m worried that the index is just based on file names, thus ignoring all the compressed videos. But i also thought that the whole point of dedup was to analyze files on a deeper level than just the file name. Or is that only done on the initial detection of each file? what happens if a file is edited?

Anyway, any info would be much appreciated.

Droolio · 27 February 2025 12:16

Not in a particular way. Duplicacy will detect a file was modified via its timestamp since the last backup, and re-chunk the entirety of that file.

Any similarities within will produce the same chunk hash, and therefore skip uploading since it already exists. Any differences will be uploaded as new chunks.

If you’re compressing videos, this should result in an entire re-encode of the video stream, so should result in pretty close to 100% delta.

So you may wanna check your re-encodes were actually saved in the correct repository, coz that doesn’t sound right. How are you compressing them? Was was the compression ratio?

You mention ‘indexing’ but not backup - what is the result of a check -tabular after a backup?