purescript-contrib/purescript-parsing

Change Position to Int

jamesdbrock opened this issue · 3 comments

Delete Position { column :: Int, line :: Int } and replace it with Int representing the position index from the beginning of the input. For String, the position index would be in units of CodePoints.

Delete the updatePosString and updatePosSingle functions.

-- | Updates a `Position` by adding the columns and lines in `String`.
updatePosString :: Position -> String -> String -> Position
updatePosString pos before after = case uncons before of
Nothing -> pos
Just { head, tail } -> do
let
newPos
| String.null tail = updatePosSingle pos head after
| otherwise = updatePosSingle pos head tail
updatePosString newPos tail after
-- | Updates a `Position` by adding the columns and lines in a
-- | single `CodePoint`.
updatePosSingle :: Position -> CodePoint -> String -> Position
updatePosSingle (Position { line, column }) cp after = case fromEnum cp of
10 -> Position { line: line + 1, column: 1 } -- "\n"
13 ->
case codePointAt 0 after of
Just nextCp | fromEnum nextCp == 10 -> Position { line, column } -- "\r\n" lookahead
_ -> Position { line: line + 1, column: 1 } -- "\r"
9 -> Position { line, column: column + 8 - ((column - 1) `mod` 8) } -- "\t" Who says that one tab is 8 columns?
_ -> Position { line, column: column + 1 }

In updatePosString there is an assumption that 1 tab = 8 spaces and there is no way for the library user to change that behavior. So I think updatePosString has always been fundamentally broken.

We want to provide a way to track the line and column during the parse so that

  1. We can write indentation-sensitive parsers.
  2. We can report the line and column in a ParseError.

The Text.Parsing.Indent module is used by some packages so we should try to keep it.

Text.Parsing.Indent is based on

https://hackage.haskell.org/package/indents-0.3.3/docs/Text-Parsec-Indent.html

but the author of the indents library seems to have changed their mind and later versions are quite different

https://hackage.haskell.org/package/indents-0.5.0.1/docs/Text-Parsec-Indent-Explicit.html

Reporting the line and column in a ParseError depends on what the indentation algorithm is, which should be defined by an indentation-sensitive parser.

I like the idea of an indentation-sensitive parser which is expressed as a transformer of ParserT.

https://hackage.haskell.org/package/indents-0.5.0.1/docs/Text-Parsec-Indent.html

So we have something like an IndentT transformer which contains the line and column state.

In the event of a parsing failure, the ParseError must also convey the line and column state of the IndentT.

How can we get the indentation level out of IndentParser state and include it in a ParseError?

I really don’t want to paramaterize ParseError but maybe that would be the best way.

withPos could include a region which adds the indentation information to the error message string. For that, region would have to be changed to pass the current ParseState to the function region :: forall m s a. Monad m => (ParseState -> ParseError -> ParseError) -> ParserT s m a -> ParserT s m a

For starters I think we should change the definition of ParseError from ParseError String Position to ParseError String ParseState.