Using spans instead of positions

Question

Using spans instead of positions

davidpdrsn opened this issue 5 years ago · 2 comments

I'm the maintainer of https://crates.io/crates/juniper-from-schema which uses graphql-parser to parse schemas at compile time and generate juniper code.

Currently the error messages I generate look something like

error: proc macro panicked
 --> src/main.rs:3:1
  |
3 | juniper_from_schema::graphql_schema_from_file!("schema.graphql");
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = help: message:

          error: You have to define a custom scalar called `Url` to use this type
           --> schema:6:3
            |
          6 |      url: Url!
            |      ^


          aborting due to previous error

However I would like to go the extra mile and use something like https://crates.io/crates/codespan-reporting. That would also allow me to remove all my custom error formatting code.

However codespan-reporting requires spans (specifically Range<usize>) rather than single positions. This is unfortunate because graphql-parser only provides Pos which have a line and column number, not a range of where a syntax node appears in the source text.

Changing this would obviously be a breaking change but what are your thoughts about replacing Pos with somekind of Span? If you're up for it I wouldn't mind making a draft PR for you to review. I'm guessing the changes required would be something like this:

pub struct Span {
    start: usize,
    end: usize,
}

pub fn input_value<'a, X>(input: &mut TokenStream<'a>)
    -> ParseResult<InputValue<'a, X>, TokenStream<'a>>
    where X: Text<'a>,
{
    (
        position(), // <- capture start position
        optional(parser(string)),
        name::<'a, X>(),
        punct(":").with(parser(parse_type)),
        optional(punct("=").with(parser(default_value))),
        parser(directives),
        position(), // <- capture end position
    )
    .map(|(start, description, name, value_type, default_value, directives, end)|
    {
        InputValue {
            position: Span { start, end }, description, name, value_type, default_value, directives,
        }
    })
    .parse_stream(input)
}

While somehow also changing TokenStream's position type to use byte index rather than line and column numbers. We could of course still provide helper methods to get line and column numbers.

Answer 1 · 2020-10-14T12:40:00.000Z

The idea was that position is good trade-off of memory vs specificity of the position. I.e. this is not enough for doing in-place modification of the text but should be good enough to point to the error.

Now codespan-reporting requires not just span, it also requires offsets, not line/column numbers. So in the places where we have 16 bytes of position it would be 48 bytes span (I don't think we can go without line/column, because offset only makes sense when original source is there, but it's not always the case for errors). This is not just the memory usage it's also more copying, including reallocating vectors, etc.

This kind of large spans are fine for compiling programming languages or for config files. But I'm not sure this is good for graphql parser which is expected to be at the hot code path.

Any other opinions @graphql-rust/all-maintainers ?

Answer 2 · 2020-10-14T13:07:58.000Z

I see your point but I'm not sure parsing queries will be a significant part of the time spent handling a request. But I still agree that we should be careful with regards to performance so if no one needs spans I'll make do without.