Haskell: errors as record types
vkpgwt opened this issue · 2 comments
Hello!
BNFC errors in the Haskell backend are plain String
values like this: syntax error at line 3, column 1 before `token'
. There are some inconveniences here:
- a String is not flexible and customizable enough, as compared with an ADT record type. Currently there is no possibility to get the line and column numbers other than parsing the error message, which is risky in case of future changes in the error message format.
- more information might be provided for better error messages in the ADT record fields: the list of expected token names and maybe the name of the rejected token rule. BNFC Haskell backend is implemented via Happy, which can provide this information.
The new error type might be like:
data BNFCError = BNFCError
{ bnfcLineNo :: Int
, bnfcColumnNo :: Int
, bnfcBadInput :: String
, bnfcExpectedTokenNames :: [String]
, bnfcAcceptedTokenName :: String
}
In fact, it might be more complex, since BNFC can also generate lexer errors, and we might want to support the Text
type.
Since this is a breaking change, a new command line option might be introduced, like --error-type=(string|record)
, defaulting to the String
errors.
Would you agree with this change? I would appreciate if you give some advice. I'm using BNFC in production and we need good error messages. It will be perfect if I create a pull request that will be accepted finally, otherwise I'm going to patch BNFC locally.
Hello Anton @vkpgwt, thanks for your suggestion and your interest in BNFC!
I think it is a very good proposal.
I suppose this is for the Haskell backend only? (The other backends also do not have structured error reporting.) You might be aware that there is a common testsuite for all backends, to ensure that the generated parsers and printers behave uniformly. (However, this does not stop individual backends to have special features.)
To keep code duplication minimal, I suggest to refactor the Haskell backend to always produce a structured error in the parser, an only convert to string errors in the top-level parser functions (unless --errors=structured
is given).
I suggest --errors=string|text|structured
as option name to make it less Haskell-specific (and open the avenue to have this in the other backends as well).
--errors=string|text|structured
I'm confused. Having string|text
selector in multiple places (--text-token
, maybe there are others) would give spurious combinations like Text tokens / String errors, String tokens / Text errors.
And if the token name type is driven by the existing option, this option would look like --errors=plain|structured
which is basically a --structured-errors
flag.