NOT_IS token never generated
yperess opened this issue · 3 comments
I believe that the NOT_IS
token is never generated due to EXCL_WS
and EXCL_NO_WS
being declared above it.
In my test, a simple test written in Kotlin:
val lexer = KotlinLexer(CharStreams.fromString("!is"))
val tokenStream = CommonTokenStream(lexer).apply { fill() }
val tokens = tokenStream.tokens
assertThat(tokens).hasSize(2) // This fails
assertThat(tokens[0].type).isEqualTo(KotlinLexer.NOT_IS)
assertThat(tokens[1].type).isEqualTo(KotlinLexer.EOF)
What I actually get is 3 tokens, !
, is
, and <EOF>
.
Good catch! I'll recheck with our test base
Ok, let's clarify this.
If you look at definition for NOT_IS, you may find that it actually requires a space or newline after it:
NOT_IS: '!is' (Hidden | NL);
So your example is invalid, as there is no space or newline after the token, so lexer resorts to the other valid sequence of !
and is
.
The idea of requiring hidden symbol after !is
is not to generate NOT_IS for sequences like val x = !isTrue
. If we add EOF as another hidden symbol option here, your example would work as intended, but there is no real profit for grammar here: ending a Kotlin file with !is
is never correct.
Ordering of tokens does not matter in this particular example as it only matters when two single tokens match the same input of the same length, otherwise lexing is greedy as it should be and produces the longest tokens possible, which, in this case, will always be !is
.