CharStreams.fromFileName duplicates the file input source
AlexCouch opened this issue · 3 comments
The first time I noticed this was when I was working on the lang example for kotlinx-llvm called toylang. The grammar works perfectly using the antlr intellij plugin's preview panel. However, in code, using CharStreams.fromFileName duplicates the input.
Example from toylang:
Input:
let test = "Hello";
fn testFunc(){
println(test);
}
CharStreams.fromFileName:
let test = "Hello";
fn testFunc(){
println(test);
}
let test = "Hello";
fn testFunc(){
println(test);
}
It appends a whole bunch of random whitespace between the two copies for some reason. I wasn't sure if it was user error so I decided to reproduce the bug and well...
Example from test:
Input:
2 + 2
10 - 3
8 * 2
24 / 6
CharStreams.fromFileName:
2 + 2
10 - 3
8 * 2
24 / 6 2 + 2
10 - 3
8 * 2
24 / 6
My current gradle file is on the repo. I tried reproducing the bug with the Java API with no cigar. I am actually not sure if this is a kotlin problem or not, so I'll try to see if I can isolate the problem outside of the antlr-kotlin. I have no been able to find anything significant in the actual antlr-kotlin source code.
Found the possible problem: https://github.com/Strumenta/antlr-kotlin/blob/master/antlr-kotlin-runtime/src/jvmMain/kotlin/org/antlr/v4/kotlinruntime/CharStreams.kt#L115-L122
I'm not really sure what exactly is happening here. All I know from stepping through the code is that we are for some reason just appending the read input bytes to the byte buffer even when we never read any bytes. endOfInput
is being set according to whether bytesRead is -1 or not, but regardless of that, it still is appending the utf8BytesIn array to the byte buffer thus causing it to duplicate.
I had the same issue. I thought it was an antlr-kotlin issue but it was not. when I moved away from ChaStrams.fromFilePath....the issue disappeared. I did not investigate further
I had the same issue. I thought it was an antlr-kotlin issue but it was not. when I moved away from
ChaStrams.fromFilePath
....the issue disappeared. I did not investigate further
I'm stepping through it, and it appears to be the CharStreams.fromFileName
. When I use CharStreams.fromString
, it works just fine. This is because the CharStreams.fromFileName
uses fromPath
which uses fromChannel
. The default implementation fromString
does not use fromChannel
is uses StringCharStream
which has its own entire implementation apart from the problematic functions involved.
I have a temporary workaround
val testFile = File("test.toy")
val `in` = BufferedInputStream(FileInputStream(testFile))
val src = String(`in`.readBytes(), charset=Charsets.UTF_8)
val lexer = ToylangLexer(CharStreams.fromString(src))