semgrep/semgrep

Lack of support for switch statement in C/C++

Closed this issue ยท 7 comments

0xdea commented

I'm trying to create a Semgrep rule for C/C++ to match the pattern described in:

CWE-478: Missing Default Case in Switch Statement)
https://cwe.mitre.org/data/definitions/478

For instance, the following pattern generates an error:

 - pattern: |
    switch ($COND) {
    case ...
    }

I tried different variations, and it looks like switch statements in C/C++ are not currently supported by Semgrep. Am I missing something?

Definitely a bug: https://semgrep.dev/s/4kR8

Note C/C++ support is still in experimental. But thanks for reporting!

aryx commented

@nmote did your work on switch in JS helps for this too?

nmote commented

Probably not. I introduced the concept of a standalone switch case as a pattern (e.g. just case 4: ...). Fully-formed switch statements already worked as patterns in JS/TS. It looks like the issue here isn't that we can't parse switch statements as patterns, but that we don't allow the semgrep ellipsis in them. The pattern switch ($X) { } parses and matches the switch statement in that example.

There are a few different interesting things about this:

  • This pattern makes use of the fact that Semgrep considers an empty body to match a body with any number of elements.
  • Semgrep cannot parse ... in the condition of the switch statement. This seems to me to simply be a bug, and should be addressed.
  • Semgrep cannot parse ... in the body of a switch statement. I looked into addressing this to JS, but unfortunately realized that there is an ambiguity here. If you write switch ($X) { case 5: ... }, should that ... match only some number of statements associated with the case 5, or should it match a case 5 with no associated statements, followed by some other cases? In the end, the matching machinery appears to have been designed to avoid the necessity of using the semgrep ellipsis in these cases anyway. Instead of switch (...) { ... } you can just write switch (...) { }, and instead of switch (...) { ... case 5: ... ... } you can just write switch (...) { case 5: ... } and Semgrep will match even if there are other cases as well (https://semgrep.dev/s/g8DJ). Unfortunately, this means it's not clear whether there's a way to write a pattern to match a switch with an empty body, or to write a pattern with a switch that contains exactly one case, or a list of specific cases.

Basically, I think we should make sure that ... can be parsed in switch conditions in C/C++, but other than that, I think any changes here would need to be more systematic and their consequences thought-through.

In the meantime, I think this rule should be a suitable workaround: https://semgrep.dev/s/J8Yw

0xdea commented

It looks like it, I'm closing the issue. Thanks!

aryx commented

fixing bugs without even knowing :)