Problems with differential reporting

Question

Problems with differential reporting

ziad-fernride opened this issue 3 months ago · 6 comments

Hi, thanks to @henry2cox 's help in: #314, I managed to get something a lot better running ... But after a couple of days of data collection, I am seeing these issues:

Example 1: incorrect diff
Base commit coverage report:

Differential coverage report:

Example 2: incorrect ECB, UIC values
I am seeing a lot of these values:

And then when I step into the file, there are actually no UIC nor ECB.

Ok, now my workflow is:

bazel coverage //...
filter the .dat coverage report using:

        lcov
        -o "coverage_report_$(current_commit).dat"
        -a "$(pwd)/coverage_report.dat"
        --omit-lines '^\s*}\s*$'
        --omit-lines '^\s*{\s*$'
        --omit-lines '^\s*};\s*$'
        --filter function,trivial,line,branch
        --branch-coverage

upload report to be accessed by later PR CI runs.
obtain diff.txt

        git diff --unified --no-prefix "$(base_commit)" "$(current_commit)" > diff.txt

generate a differential report:

        genhtml "coverage_report_$(current_commit).dat" \
        --substitute "s@$(pwd)/@@" \
        --baseline-file "coverage_report_$(base_commit).dat" \
        --ignore-errors empty,unmapped,range,source,unsupported,unused,mismatch,path \
        --synthesize-missing \
        --show-proportion \
        --filter function \
        --quiet \
        --diff-file diff.txt \
        --output-dir "$(pwd)/diff_pages/coverage_report" \

Any idea why the differential report is still unreliable ? Notice that I am not using a --version-script ... would that fix this ? what does it do on top of the diff anyways ?

Thanks a lot 🙏

Answer 1 · 2024-08-14T13:00:08.000Z

With respect to the major question here ("why does the differential categorization look wrong here?"), I very strongly suspect that the reason is that the 'diff' file is wrong and/or the source code versions do not match the corresponding coverage DB.

please take a look (and/or upload) the diff entry for the file shown in the incorrectly categorized sample.
What does it say about code changes on line 101 of the current file/line 99 of the baseline?
that line is categorized "UNC" because the differential report thinks that it has been modified. Since its hit count is zero: UNC.
The same thing is true of the UIC/GIC/ECB/EUB categories...the 'diff' data says that those lines are unchanged, but the coverage DB says that there was code (or was no code) on the corresponding lines.

From a UI perspective: the number in the 'file summary' header table in the directory view should be a hyperlink - to take you to the first coverpoint in the corresponding category in the corresponding file.
The hyperlink does not work if you use --frames (I think, a bug in both chrome and firefox...but I could not make it work).

With respect to the --version-script question:

using --version-script at data capture causes version information to be written into the coverage DB.
then, at filter or annotation time, the tool can check that the version it sees on the filesystem (or retrieves from revision control) matches.
This is useful because source-based filters are going to modify coverpoints based on what they see in the code - and if the line number correspondence is wrong, then all bets are off.
Similarly, the source code view can look strange (even without filtering) - say, because coverpoints appear in clearly incorrect places (...in the middle of comments, in places where there are clearly no conditionals, etc.)

Minor point: your first exclusion regexp could be '^\s*};?\s*$' - or you could put optional space before the semicolon, if you think somebody might write code that way.
You could also combine the regexps into one that deals with lone open/close brace.

Answer 2 · 2024-08-29T21:54:19.000Z

Sorry for my late reply 😊

Regarding the more important problem of the failing criteria, I realized the change actually moves the (.) into an (arrow) operators in the lines caught as NUC (since the developer technically removed a line and inserted a new one) .. so all good there 👍 that was my bad actually!

Regarding the other less important false positive ECB and UIC, they do show up often even with no code diff changes. I can show some real life diffs, although I think there is little value in that since they are empty from a code's perspective...

One interesting thing about the number hyperlink is that it is indeed there ... but when clicking on it, it takes me to lines that are not marked in the case of false positives. For example, a file would show ECB of 1, but none of the lines are marked ECB.

I will try to find a more concrete example tomorrow ! Cheers, Henry !

Answer 3 · 2024-08-30T10:50:00.000Z

Glad the changed code was resolved.

With respect to the ECB and UIC code:

"Excluded Covered Baseline" means that the tool thinks that the source line is unchanged - but it is code in the baseline report (the corresponding line has a non-zero hit count) whereas it is not code in the current coverage report (the corresponding line has no count at all).
"Uncovered Included Code" is somewhat opposite: the source is unchanged and the line was "not code" in the baseline report, but "is code" and has a zero hit count in the current report.

It should be possible to check the data associated with one or more of these lines to see what is happening.
Note that the corresponding lines may have different line numbers between baseline and current due to edits. The 'diff' data is used to track this.

It is always possible that there is a bug such that your coverage data is miscategorized - so it is worth to track down the issue.
The version you are using is a bit old (TOT is more than a year and many, many changes newer) - but I don't recall fixing anything in this area, so I don't think that moving to latest will change anything.

I have seen "miscategorization" caused by version mismatches and data inconsistency - which was the genesis of the version callback and associated error checking. I recently added some additional consistency checks to try to catch more such situations (not yet pushed/not externally available).

To debug this case, it may be helpful to tell the tool to look at just one file (one of the ones where you see apparently wrong categorization):

add --include path/to/my/source/file.cpp to your genhtml command line.
(You do not need to modify any of the input data.)
does the reduced report show the same categorization?

If not: this is definitely a bug. I probably need more data to debug (the subset described below will not be enough).

If so:

hopefully, you can share the testcase. I need the 'baseline', and current coverage data for that file (just the part between SF:path/to/my/source/file.cpp) and the 'diff' data (just the section for that file - you can fake the actual content. Doesn't need to be your source).
Run a quick test with your munged data to check that it still exhibits the same issue on the same line number.
If you can't share: you can turn up verbosity (genhtml -v -v ...) to get the tool to tell you more about what it it is doing.

Answer 4 · 2024-09-06T15:08:08.000Z

Hey again, for feedback, I am now using version 2.1 ... will wait and see what issues regarding diff coverage show up :)

But I have some interesting stuff to report (will open separate issues for those)

Answer 5 · 2024-10-15T10:28:46.000Z

I think that this issue is probably addressed by the consistency checking added in the SHA mentioned above.
If nothing else: the checks should make debugging/diagnosing any problems a bit easier.

If there are no updates, then I will close this issue in a couple of days (or you can close it).
Thanks.

Answer 6 · 2024-10-15T22:42:54.000Z

Thanks, Henry ! Agreed, will close the issue