Marwes/combine

Question about span information for parser error reporting

Closed this issue · 3 comments

I am parsing a programming language. I would like report parsing errors in a human friendly manner.

For example parsing a string literal "foo\u{fffffffffff}" I want to generate an error report that looks like:

This is not a valid code point:

1|   "foo\u{fffffffffff}"
         ^^^^^^^^^^^^^^^
The valid code points are between 0 and 10FFFF inclusive.

I have been playing around with the API and have gotten stuck with adding span information to parsers. I am trying to figure out how to return the start position of the sub parser that failed. I see the end position is available in the Errors struct. Is there a built in way to work with span information that I am missing?

Below are my experiments to get the start position to be included in the Errors struct.

use combine::{choice, Parser, EasyParser, parser, between, token, many, satisfy,
 not_followed_by, any, attempt, one_of, many1, position, value, optional, look_ahead};
use combine::error::{ParseError, Info};
use combine::stream::{Stream};
use combine::parser::char::digit;

use combine::stream::position::SourcePosition;
pub(crate) fn number<Input>(start: SourcePosition) -> impl Parser<Input, Output = f64>
where
    Input: Stream<Token = char, Position = SourcePosition>,
    // Necessary due to rust-lang/rust#24159
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,
{
    let chompZero = token('0')
        .then(move |_zero| {
            choice((
                token('x'), //.with(chompHexInt),
                token('.'), // .then(|_| chompFraction()),
                //value(_zero),
                look_ahead(digit()).and(position()).flat_map(move |(_, _end)| {
                    //todo fail with Number No Leading Zero),
                    let mut error = <Input::Error as ParseError<char, Input::Range, Input::Position>>::empty(start);
                    // panic!("start: {:?} end: {:?}", start, end);
                    error.add_message(MyCustomError{});
                    Err(error)
                }),
            ))
        }).map(|_| 0.0);
    
    chompZero
}

struct MyCustomError {}

impl<'s, Token, Range> combine::error::ErrorInfo<'s, Token, Range> for MyCustomError {
    type Format = &'static str;

    fn into_info(&'s self) -> Info<Token, Range, Self::Format> {
        eprintln!("called into info");
        Info::Static("hello world")
    }
}

fn main() {
    use combine::stream::position;
    let mut parser = position().then(number);
    let result = parser.easy_parse(position::Stream::new("01 "));
    // I would like to somehow see the start position show up in the output here so I can write a
    // function that takes the Errors struct and renders the pretty output with an underline for the
    // bottom most failing parser.
    println!("{:?}", result);
}

I am seeing that my custom error type gets called by combine's internal logic. I don't see my custom static error in the list of errors.

Err(Errors { position: SourcePosition { line: 1, column: 2 },
errors: [
    Unexpected(Token('1')),
    Expected(Token('x')),
    Expected(Token('.')),
    Expected(Static("digit"))
] })

I know that I can thread my own mutable context object through the parser from the top down to store my error information e.g. fn number<Input>(start: SourcePosition, context: MyContext) -> impl Parser<Input, Output = f64>. I would like to avoid the extra context approach because it is uglier to deal with ownership of the context when there multiple choices.

I think something like #305 would work for this, not merged but perhaps you can test the branch?

The reason why you message does not show up is that the position for that error (start) is before the errors emitted by the other parsers in the choice. Since easy::Errors only store one position it only keeps the errors that occured after successfully parsing the most amount of data.

The span implementation looks like it works for my purposes. I tried nesting two spanned parsers and it choose the narrower span for the error reporting.

// I just had to change the bounds on my parser function from 
// Input: Stream<Position = SourcePosition> to Position = Span<SourcePosition>
pub(crate) fn number<Input>(/*start: SourcePosition*/) -> impl Parser<Input, Output = f64>          
where                                                                                               
    Input: Stream<Token = char, Position = Span<SourcePosition>>,                                   
    // Necessary due to rust-lang/rust#24159                                                        
    Input::Error: ParseError<Input::Token, Input::Range, Input::Position>,                          
{                                                                                                   
    token('0')                                                                                      
        .then(move |_zero| {                                                                        
            choice((                                                                                
                token('x'), //.with(chompHexInt),                                                   
                token('.'), // .then(|_| chompFraction()),                                          
                not_followed_by(digit()).map(|_| '0'),                                              
            ))                                                                                      
       }).map(|_| 0.0)                                                                              
}                                                                                                   
                                                                                                    
#[test]                                                                                             
    fn it_works() {                                                                                 
        //let mut parser = position().then(number);                                                 
        let mut parser = spanned(sep_by::<Vec<_>, _, _, _>(spanned(number()), token(',')));         
                                                                                                    
        let result = parser.parse(                                                                  
            span::Stream::<_, easy::Errors<_, _, span::Span<_>>>::from(                             
                easy::Stream::from(position::Stream::new("0,0,01 "))                                
            )                                                                                       
        );                                                                                          
        panic!("{:?}", result);                                                                     
    }                                                                                               

Thank you so much for implementing this. You have saved me a lot of work. It would have taken me at least several days to figure out how the library internals worked well enough to make the spanned combinator. And my version would probably not have worked so cleanly with all the different types of streams.