Diff Command Output?

dullage · 15 October 2020 12:44

I’m just looking at the output of a diff command (specifically duplicacy diff -r 10 -r 11) and I’m unsure on what the output is showing me, hopefully someone can help clear this up.

I can see (in the following order):

symbol - “*”, “+”, “-” or a blank space. Perhaps:

“+” is for a file in -r 11 but not in -r 10.
“-” is for a file in -r 10 but not -r 11.
I notice the “*” and the empty space are present when a file is listed twice so perhaps this is for when the files are in both revisions but have changed? Maybe the “*” indicates the file as in revision 11?

number - e.g. “239075328”. Presumably this is a byte count, is it the original size of the file?

date - The last modified date of the file?

time - Last modified time?

hash - The hash of the file?

file path

Ultimately (and this is kind of a second question) I’m trying to work out if there is a way to see, in a particular revision, what files contributed the most to increasing the storage used. I can see from the backup log that I’m adding say 10GB of data in a backup but it would be great to know what files were to blame.

gchen · 16 October 2020 03:06

Your interpretation is correct. To find out what files take the most of the new space you’ll just need to look at lines starting with + or - and maybe find a way to sum up their sizes.

dullage · 20 October 2020 12:25

@gchen - Thanks for confirming.

So it looks like I could add up the sizes of all the files marked with a “+” or “*” but, if the byte count listed is original file size then it doesn’t take into account encryption, compression or deduplication which could massively affect “what files are to blame” for the additional data storage.

I suppose what I need to know is, of the new/modified files, how much chunk data was uploaded. Perhaps this calculation isn’t currently possible.

towerbr · 20 October 2020 13:18

To get this information, you will actually have to evaluate the files that appear in the diff command output.

But to get this information - if I understand what you want - you can get it with the check command and the -tabular option, or even the information at the end of each backup log.

dullage · 21 October 2020 12:18

Thanks @towerbr but unfortunately I don’t think either of these will actually get me what I’m looking for.

For example, if I see 70 GB of new data was uploaded in a backup I’d ideally like to see a list of files along with a value representing a count of the uploaded bytes that are associated with that file. That way I easily identify which files are “to blame”.

I think my best bet might be to just look at the original file sizes listed in in the backup report. As above, this won’t account for encryption, compression or de-duplication but will probably steer me in the right direction.

towerbr · 21 October 2020 16:10

You can check/grep the INFO UPLOAD_FILE entries in the log, but which will represent only the original size at the end, as you said.

2020-10-17 19:12:22.126 INFO UPLOAD_FILE Uploaded xxxxxx (60111382)

But it seems to be a suitable way to find “who to blame”.

towerbr · 3 April 2021 13:46

Just added this to the Guide.

Thanks!