0xdea/semgrep-rules

Ruleset rumor reduction & Undefined behavior rule

Opened this issue ยท 2 comments

First of all, @0xdea, you have done great work there with the coverage, also if the support offered by semgrep is still limited.

I found that some C/CPP functions are flagged as vulnerable code in multiple rules, I will go through the one I faced and propose a fix to reduce the rumor produced by the ruleset.

Take the following call vsnprintf(NULL, 0, fmt, string);, this call has a deterministic behavior. Since an amount of 0 bytes is copied to the destination buffer, the first argument of the vsnprintf can be a NULL pointer. The length of the potentially copied data (in the destination buffer if it was big enough) is returned and can be used for other purses (e.g., allocating an appropriate buffer).
I think that the rule format-string-bugs.yaml should not match this case since it's eventually caught by the rule unsafe-ret-snprintf-vsnprintf.yaml.

Suggested change to the "format-string-bugs" rule (I can make the PR if I receive positive feedback):

- pattern-not: vsnprintf($ARG1, 0, ...)
- pattern-not: snprintf($ARG1, 0, ...)

I don't know if eventually could be interesting to have a rule for undefined behavior e.g., vsnprintf(NULL, 100, fmt, string), since this will result in writing some data to a NULL pointer, I didn't investigate it but someone can delight me regarding this ๐Ÿ˜„. More generally speaking, catch some undefined behavior could be interesting from a security perspective IMHO.

References:

  • ISO/IEC 9899:201x
    • chapter 7.21.6.12 - The vsnprintf function
    • chapter 7.21.6.5 - The snprintf function
0xdea commented

Hi @zi0Black,

Thank you for your kind feedback.

Indeed, the issue you report is real: there's some overlap between different rules in my ruleset, the prime example being the interesting-api-calls rule that sort of overlaps with most other rules. This is a wanted feature, because in my bug hunting workflow I'd rather get some false positives than miss a single true positive. That said, in my ruleset you have some rules that are aimed to direct the auditor's focus on "code smells" and some other rules that try to detect a specific bug or bug class. I believe this is the source of the confusion: I'll leave this issue open and I'll think about how I could solve it (perhaps by classifying the rules in different categories?).

Regarding your vsnprintf() example, I'm not 100% sure (we should check the specification and library sources when available), but I believe that the format string gets parsed even when nothing is written to the target buffer, therefore there might be a format string bug even in vsnprintf(NULL, 0, fmt, string). For this reason, I'd rather not filter out this pattern from the format-string-bugs rule. On the other hand, in this case the match by unsafe-ret-snprintf-vsnprintf would be a likely false positive, because hopefully the programmer was aware of the behavior of vsnprintf() when calculating its return value.

Your second example vsnprintf(NULL, 100, fmt, string) is definitely undefined behavior: the program would crash because of the NULL pointer dereference. I agree that some undefined behaviors can have security implications, and I'll think about implementing specific rules to catch undefined behaviors. However, it's a complex task and probably not worth the effort at least until Semgrep's support for C/C++ becomes more mature. In addition, most linters and compilers already implement many such checks.

I hope my reply makes sense! Thanks again for taking the time to test my ruleset and report this issue.

PS. I always read your posts on Shielder's blog with pleasure, so keep up the great work!

Hi @0xdea,
I understand and share your point of view regarding getting a false positive being better than missing a vulnerability. But probably, when you have hundreds and hundreds of results, if you don't have to double-check, it would be great.

Regarding the vsnprintf and subsequentially the snprintf, you are right (my bad) is possible to exploit the format strings (in the write way). In the following screenshot the PoC, I used this call snprintf(INIZIALIZED_POINTER,0,FMT,FIXED_POINTER_STRING) to test it.
image

For the undefined behavior rules, it's clear to me your point.