Understanding the -tabular output of the check command

towerbr · 12 August 2018 22:36

I’m evaluating some outputs from the check command with the -tabular option, and I’m not sure I’m interpreting correctly.

For easy viewing, I did a small separate backup with few files:

    snap | rev |                          | files |    bytes | chunks |    bytes | uniq |    bytes | new |    bytes |
[edited] |   1 | @ 2018-08-11 11:31 -hash |    11 |  62,261K |     60 |  62,470K |    3 |       5K |  60 |  62,470K |
[edited] |   2 | @ 2018-08-11 11:32       |   187 | 163,762K |    145 | 162,584K |    3 |      32K |  88 | 100,119K |
[edited] |   3 | @ 2018-08-11 11:33       |   429 | 509,117K |    440 | 506,638K |  298 | 344,086K | 298 | 344,086K |
[edited] | all |                          |       |          |    446 | 506,676K |  446 | 506,676K |     |          |

So in the first revision we have:

11 files
splitted in 60 chunks
of which 3 are unique (?)
with - obviously - 60 new chunks

In the second revision:

187 files
splitted in 145 chunks
of which 3 are unique (?)
with 88 new chunks (145-88=57, then 3 chunks from the previous backup (60-57) were used)

In the third revision:

429 files
splitted in 440 chunks
of which 298 are unique (?)
with 298 new chunks (440-298=142, then 3 chunks from the previous backup (145-142) were used)

My main question is whether the understanding of “uniq” is correct.

But also if the other points are correctly interpreted.

Including the contents of the last line, which shows 446 total chunks and 446 “uniq”.

gchen · 14 August 2018 02:33

“Uniq” is the number of chunks that are owned by this revision but not by any other revisions.

“New” is the number of chunks that first appear in this revision.

So your understanding of “uniq” is correct.

But this part isn’t right:

145-88=57, then 3 chunks from the previous backup (60-57) were used
440-298=142, then 3 chunks from the previous backup (145-142) were used

145-88 = 57 means 57 chunks were shared with the first revision when the second revision has been uploaded.

And 440-298=142 means 142 chunks were shared with either the first revision or the second revision. It is possible to figure how many were their from each one by doing some simple calculation.

towerbr · 9 December 2018 22:26

A little doubt: what does this - hash “label” at the first revision means?

| rev |                          |  files |
|   1 | @ 2018-11-13 14:48 -hash |  69988 |
|   2 | @ 2018-11-13 17:44       |  70096 |
|   3 | @ 2018-11-14 11:23       | 111295 |

gchen · 10 December 2018 02:31

It means the -hash option was used to create the first backup. You may not specify this option when running the command, but this option was turned on because it was the initial backup (no previous backup to compare with so every file needed to be fully scanned).

Flibble · 11 December 2018 21:46

Thanks for this topic, I wonder that as well.

TheBestPessimist · 28 January 2019 06:57

5 posts were merged into an existing topic: #wikify tag is not #staff only

Droolio · 5 February 2019 17:11

Sorry to bump this old thread but I just wanted to clear up some confusion I have about the stats.

I understand uniq and (I think) new but I’m wondering how these numbers would change if, say in the example above, rev 2 got pruned.

I gather the uniq columns would change when new backups are created and when older versions got pruned. That makes sense - also a good way to measure de-duplication. But not necessarily growth - the new columns would be good for that, but am I right in that these “new” numbers won’t change when revisions are pruned?

Edit:

I may just have answered my own question by comparing two check logs from the web edition.

Seems both the uniq and new columns are dynamically calculated at the time of check, so they both have the potential to change after revisions are pruned. i.e. uniq will show what is currently unique for a revision, between all remaining un-pruned revisions.

The new column will show what is new since the last un-pruned revision, necessarily not the amount of new data that was written at the time (but the total of it and all prior prune revisions).

Christoph · 6 February 2019 11:08

I have not followed this topic in detail, so I’m not making any judgements, but I just want to use this “reactivation” of an older discussion to mention that there is no need to push as many related things as possible into a single topic. In other words: it is okay - indeed: it is preferable - to start a new topic for a new question. In a new topic you can still quote posts from related topics (or simply mention the topic as dealing with a similar question).

Of course, it is sometimes hard to say whether one is raising a new question or merely a different aspect of a previously discussed question, but my point is: if in doubt, start a new topic. That way, the forum will be easier to navigate (and find relevant information on) in the future (long discussions can be laborious to read through).