Incorrect column info for unexpected token exception
Closed this issue · 9 comments
Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/release/src/kestrel/syntax/kestrel.lark
Generated parser:
kestrelParser.js.zip
When parsing this statement var=get
, the parser throws the unexpected token exception with
e.line =1
e.column=5
However, the column should be 7.
Same incorrect column info for the following test strings.
var=get file
, e.column
is 7, but should be 12.
var=get file from
, e.column
is 14, but should be 17.
var=get file from abc
, e.column
is 19, but should be 21.
I tried the first example you gave, and I got
token: Token {
type: '$END',
start_pos: 8,
value: '',
line: 1,
column: 9,
end_line: 1,
end_column: 13,
end_pos: 12
},
This is the same answer you get from the Python version.
It's not the end of the file (you can find that easily on your own), but the last valid position the parser was able to reach.
We can argue if that's the right thing to return or not, but it seems like everything is working in order.
(I don't know why you got 7. Make sure you're using the latest commit)
I also tried to install lark-js again from repo using command pip3 install -e git+https://github.com/lark-parser/Lark.js.git#egg=lark-js
, and the result is the same..
Can you post a reproducing script? (a js file that, when run, reproduces the error. Plus the grammar file ofc)
Sure. The grammar file and the generated parser JS file is attached in the Description
field of this issue.
My code to do parsing looks like below.
const kestrel_parser = require('./parser/kestrelParser');
const {get_parser, UnexpectedCharacters, UnexpectedToken} = kestrel_parser;
const parser = get_parser({keep_all_tokens: true});
function App() {
let treeData = null;
let errorMsg = '';
function handle_errors(e) {
console.debug(e.line, e.column)
if (e instanceof UnexpectedCharacters) {
if (errorMsg.length === 0) errorMsg = `Unexpected characters "${e.char}" at position ${e.column}`;
} else if (e instanceof UnexpectedToken) {
// print the 1st encountered error
if (errorMsg.length === 0) errorMsg = `Unexpected token "${e.token.value}" at ${e.token.type} position ${e.column}, expected ${[...e.expected].join(',')}`;
} else if (e instanceof SyntaxError) {
console.debug(e)
} else {
console.debug("unknown error:", e.constructor.name)
}
// return ture to keep parsing
return true;
}
try {
treeData = parser.parse("var=get", null, handle_errors).children[0];
} catch (e) {
console.debug("uncaught error:", e)
}
}
I don't see the problem?
For var=get file
it's 9
For var=get
it's 5
Everything seems in order
Okay, so the column means the token "start" position? Hm..then what I need should be end_pos
. Thanks.
Yes, it's the start of the last valid position, which in this case is the start of the token that caused the error.
(to the best of my memory)