ocaml-community/sedlex

match default (_) doesn't capture anything

pmetzger opened this issue · 3 comments

I'm not sure if it is intentional (in which case it should be documented) or not, but if you match _, the lexeme you get is empty (that is, ""). This means you can't, for example, use the default as a way to catch bad characters and report what they were, you need to match "any" for that.

This seems like a slightly odd choice to me, but again, it probably should be either documented (and explained) or changed.

I think this is consistent with the semantics of formal languages. The largest language includes the empty word "". One would expect a wildcard (i.e., _) to match on anything, including the empty word.

The thing that may be a bit subtle is that _ matches lazily, rather than greedily. That is, _ will match the smallest possible word in the "full language", which is always "".

I'm not sure I love the behavior, but it should at least be documented I think.

(I don't love = matching empty means that you need to do something unusual to match unexpected characters, and that "_" isn't very useful.)