Master's degree dissertation - Athos Ribeiro - IME-USP

TODO

It would be interesting to see the size of the distinct warnings on the histogram of the warning severities in each tool (see table IV)
Show some comparison for when the tools cover the same flaw of not.
There are 174 warning lines, where all the tools show a flaw, but the label is false according to Juliet. At the same time there is not any case where the same situation is hold but the label is true. Maybe this could be discussed in the paper (maybe an example?)
Maybe other tools with broader range of findings could be more reliable in the validation.
How does the true negative rate influence the classifier?
In the attached file there are some interesting cases, which should be investigated in more detail. E.g.: all tools give warnings for a particular line, while Juliet gives a false label for that line.