norskeld/sigma

Result model: span vs pos

mindplay-dk opened this issue ยท 5 comments

Hey,

I was looking over these types:

/** Represents failed execution. */
export type Failure = {
readonly isOk: false
readonly span: Span
readonly pos: number
readonly expected: string
}
/** Represents successful execution. */
export type Success<T> = {
readonly isOk: true
readonly span: Span
readonly pos: number
readonly value: T
}

I see that spans were added on later.

Maybe I'm missing something, but I was wondering:

  1. In a Success, isn't pos always going to be identical to span[0]?
  2. In a Failure, isn't span[0] always going to be identical to span[1] as well as to pos?

A failure always happens at one specific position, does it not? When would the span mean anything?

And a success always has both a start and an end - even if these are identical, that would signify a zero-length match, so both values are always meaningful, right?

Any particular reason you wouldn't just deprecate or remove pos?

Or just have plain start and end properties for Success, and pos for Failure? Is there any practical advantage to having those tuples? maybe for source maps or something? Does it matter if those properties have the same names/types?

Just wondering.

This library looks amazing btw. ๐Ÿ˜„

I mean...

/** A parser that succeeds if the given parser fails and vice versa. */
function not<T>(parser: Parser<T>): Parser<T> {
  return {
    parse(input: string, pos: number): Result<T> {
      const result = parser.parse(input, pos);

      if (result.isOk) {
        return {
          isOk: false,
          span: [pos, pos],   // ๐Ÿ‘ˆ
          pos: pos,           // ๐Ÿ‘ˆ
          expected: `not ${result.value}`
        };
      } else {
        return {
          isOk: true,
          span: [pos, pos],   // ๐Ÿ‘ˆ
          pos: pos,           // ๐Ÿ‘ˆ
          value: null as any
        };
      }
    }
  };
}

My code is like "pos! pos! pos! pos!! get it??" ๐Ÿ˜†

This library looks amazing btw. ๐Ÿ˜„

Hi! Thanks for the kind words.

  1. In a Success, isn't pos always going to be identical to span[0]?
  2. In a Failure, isn't span[0] always going to be identical to span[1] as well as to pos?

Yep, pos and spans values may overlap, which seems redundant, but that doesn't always hold true and actually depends on whether a parser combinator is consuming or non-consuming, plus specifics of some parser combinators may come into play.

Anyway pos and span serve different purposes: the former is used to show where some parser combinator stopped execution, while the latter carries information about the range of input data that parser combinator processed (or failed to process). They may or may not overlap.

For instance, here is the sepBy combinator's implementation (lines 32-33):

export function sepBy<T, S>(parser: Parser<T>, sep: Parser<S>): Parser<Array<T>> {
return {
parse(input, pos) {
// Run the parser once to get the first value.
const resultP = parser.parse(input, pos)
// If the parser succeeds, run the parser and separator parser many times.
if (resultP.isOk) {
const resultS = many(sequence(sep, parser)).parse(input, resultP.pos)
const values = [resultP.value]
// If the parsers succeed, concatenate the values sans the separator.
for (const [, value] of resultS.value) {
values.push(value)
}
return {
isOk: true,
span: [pos, resultS.pos],
pos: resultS.pos,
value: values
}
}
return {
isOk: true,
span: [pos, pos],
pos,
value: []
}
}
}
}

A failure always happens at one specific position, does it not? When would the span mean anything?

It happens at one specific position (pos), but span in this case should carry information about how much current parser combinator parsed before it failed, which is useful for producing meaningful errors.

P.S. I realized that sepBy (and likely some other combinators) will produce incorrect spans when failing... Damn. Thanks for the questions!

I mean...

/** A parser that succeeds if the given parser fails and vice versa. */
function not<T>(parser: Parser<T>): Parser<T> {
  return {
    parse(input: string, pos: number): Result<T> {
      const result = parser.parse(input, pos);

      if (result.isOk) {
        return {
          isOk: false,
          span: [pos, pos],   // ๐Ÿ‘ˆ
          pos: pos,           // ๐Ÿ‘ˆ
          expected: `not ${result.value}`
        };
      } else {
        return {
          isOk: true,
          span: [pos, pos],   // ๐Ÿ‘ˆ
          pos: pos,           // ๐Ÿ‘ˆ
          value: null as any
        };
      }
    }
  };
}

My code is like "pos! pos! pos! pos!! get it??" ๐Ÿ˜†

I think you actually want something like:

export function not<T>(parser: Parser<T>): Parser<T> {
  return {
    parse(input, pos) {
      const result = parser.parse(input, pos)

      switch (result.isOk) {
        case true: {
          return {
            isOk: false,
            span: result.span,
            pos: result.pos,
            expected: `not: ${parser.name}`
          }
        }

        case false: {
          return {
            isOk: true,
            span: result.span,
            pos: result.pos,
            value: ???
          }
        }
      }
    }
  } as Parser<T>
}

The not combinator is hella tricky though, I didn't manage to implement it correctly.

Okay, so if I understand correctly, span is more like "meta data" - it's informational, for error reporting purposes and such. It's not typically something a combinator would use for decision making - as opposed to pos, which other combinators use to decide where to start/continue from. Did I get it? ๐Ÿ˜

Seems like maybe these properties would be worthy of a good old doc-block? โ˜บ

Thanks for the clarifications! I will start using the library one of these evenings, it looks really incredible. (It reminds me a bit of Lukas Renggli's PetitParser, if you've ever seen that? Just great work.)

Okay, so if I understand correctly, span is more like "meta data" - it's informational, for error reporting purposes and such. It's not typically something a combinator would use for decision making - as opposed to pos, which other combinators use to decide where to start/continue from. Did I get it? ๐Ÿ˜

Yep, correct.

Seems like maybe these properties would be worthy of a good old doc-block? โ˜บ

Yeah, definitely. Hopefully I'll be able to find some time after my day job or on weekends.

PetitParser

Never heard of it, but looks cool. I see it employs some advanced techniques with a wider set of features. And this little library of mine is rather simple. I tried writing something more sophisticated without limitations that recursive descent imposes, namely GLL + packrat parser, but it turned to be atrociously slow. ๐Ÿ˜’

Simplicity is a huge selling point for me, and one reason I chose your library over 20+ others. ๐Ÿ˜Ž