Diff from incremental showing many files

fixed

#1

Hi, I have a folder hierarchy of about 260k files, I backed it up and then did an incremental; on backing it up the second time (for the incremental) the output showed that only 3 files have changed between revision 1 and revision 2.
However when I do a diff like

duplicacy diff -r 1 -r 2

it shows me around 8,000 files in the output, and they all look like the haven’t changed between the two lines for each file, size and date are the same; I ran diff on a couple of the specific files out of the 8k, and the output is empty, aka there are no difference, as expected. Why does the diff between the revisions show them then? I must be missing something on how this actually is supposed to work?

Also, side question - how on earth does duplicacy manage to do an incremental on 260k files in 23 seconds, whereas borg takes 7 minutes and restic takes 11 mins, and they’re all supposed to just go by metadata and not read actual file contents? Anybody familiar!
Thanks a bunch!


#2


#3

Just to make things clear. It is a bug that the diff command between two revisions shows a list of files that aren’t different in any way.


#5

It looks like if a file’s size is 0, Duplicacy may think it changes when one revision has a hash while the other doesn’t.

Here is an example. file1 and file3 are 0-size files, file2 is the only file that actually changes but Duplicacy lists all 3 as changed:

> duplicacy diff -r 1 -r 2
         0 2019-06-05 22:02:49 0e5751c026e543b2e8ab2eb06099daa1d1e5df47778f7787faab45cdf12fe3a8 file1
*        0 2019-06-05 22:02:49                                                                  file1
        15 2019-06-05 22:02:57 e003ec4d3d86003bad67a4eaf7b320874d3bdef7c66d30b2f1aa2d3f69cee6f1 file2
*       21 2019-06-05 22:03:19 1f4038681da7c6a2eae3d28b3c33ef121c4add7b0418c2fb4e057471d6e5fcd9 file2
         0 2019-06-05 22:03:00 0e5751c026e543b2e8ab2eb06099daa1d1e5df47778f7787faab45cdf12fe3a8 file3
*        0 2019-06-05 22:03:00                                                                  file3

@new_to_dupli is this the case for you?


#6

@gchen - thank you for looking into this, indeed all my 8k+ files that show as changed are 0 size, with one version showing as having a hash, and the other not, just as you described. Except for the 3 files that actually did change - the files are not empty and both versions have hashes for each of them.

Is this an easy one to fix?
Cheers!


#7

It has been fixed by Don't compare hashes of empty files in the diff command · gilbertchen/duplicacy@9d4ac34 · GitHub


#8

@gchen
Thank you!
Will it make it into a release soon?
Looking at the code change, I’m a bit curious how come an empty unmodified file would not have a hash in the incremental revision, but a non-empty unmodified file would. Given that all files whether empty or non-empty, store a hash in the initial revision. I guess that’s just an implementation detail.


#9

Yes, the fix will be included in the next release.