Change Position to Int

Delete Position { column :: Int, line :: Int } and replace it with Int representing the position index from the beginning of the input. For String, the position index would be in units of CodePoints.

Delete the updatePosString and updatePosSingle functions.

purescript-parsing/src/Text/Parsing/Parser/String.purs

Lines 179 to 200 in dbd9aae

    
           -- | Updates a `Position` by adding the columns and lines in `String`. 
        
           updatePosString :: Position -> String -> String -> Position 
        
           updatePosString pos before after = case uncons before of 
        
             Nothing -> pos 
        
             Just { head, tail } -> do 
        
               let 
        
                 newPos 
        
                   | String.null tail = updatePosSingle pos head after 
        
                   | otherwise = updatePosSingle pos head tail 
        
               updatePosString newPos tail after 
        
           -- | Updates a `Position` by adding the columns and lines in a 
        
           -- | single `CodePoint`. 
        
           updatePosSingle :: Position -> CodePoint -> String -> Position 
        
           updatePosSingle (Position { line, column }) cp after = case fromEnum cp of 
        
             10 -> Position { line: line + 1, column: 1 } -- "\n" 
        
             13 -> 
        
               case codePointAt 0 after of 
        
                 Just nextCp | fromEnum nextCp == 10 -> Position { line, column } -- "\r\n" lookahead 
        
                 _ -> Position { line: line + 1, column: 1 } -- "\r" 
        
             9 -> Position { line, column: column + 8 - ((column - 1) `mod` 8) } -- "\t" Who says that one tab is 8 columns? 
        
             _ -> Position { line, column: column + 1 }

In updatePosString there is an assumption that 1 tab = 8 spaces and there is no way for the library user to change that behavior. So I think updatePosString has always been fundamentally broken.

We want to provide a way to track the line and column during the parse so that

We can write indentation-sensitive parsers.
We can report the line and column in a ParseError.

The Text.Parsing.Indent module is used by some packages so we should try to keep it.

Text.Parsing.Indent is based on

https://hackage.haskell.org/package/indents-0.3.3/docs/Text-Parsec-Indent.html

but the author of the indents library seems to have changed their mind and later versions are quite different

https://hackage.haskell.org/package/indents-0.5.0.1/docs/Text-Parsec-Indent-Explicit.html

Reporting the line and column in a ParseError depends on what the indentation algorithm is, which should be defined by an indentation-sensitive parser.

I like the idea of an indentation-sensitive parser which is expressed as a transformer of ParserT.

https://hackage.haskell.org/package/indents-0.5.0.1/docs/Text-Parsec-Indent.html

So we have something like an IndentT transformer which contains the line and column state.

In the event of a parsing failure, the ParseError must also convey the line and column state of the IndentT.

How can we get the indentation level out of IndentParser state and include it in a ParseError?

I really don’t want to paramaterize ParseError but maybe that would be the best way.

withPos could include a region which adds the indentation information to the error message string. For that, region would have to be changed to pass the current ParseState to the function region :: forall m s a. Monad m => (ParseState -> ParseError -> ParseError) -> ParserT s m a -> ParserT s m a

For starters I think we should change the definition of ParseError from ParseError String Position to ParseError String ParseState.

	-- \| Updates a `Position` by adding the columns and lines in `String`.
	updatePosString :: Position -> String -> String -> Position
	updatePosString pos before after = case uncons before of
	Nothing -> pos
	Just { head, tail } -> do
	let
	newPos
	\| String.null tail = updatePosSingle pos head after
	\| otherwise = updatePosSingle pos head tail
	updatePosString newPos tail after

	-- \| Updates a `Position` by adding the columns and lines in a
	-- \| single `CodePoint`.
	updatePosSingle :: Position -> CodePoint -> String -> Position
	updatePosSingle (Position { line, column }) cp after = case fromEnum cp of
	10 -> Position { line: line + 1, column: 1 } -- "\n"
	13 ->
	case codePointAt 0 after of
	Just nextCp \| fromEnum nextCp == 10 -> Position { line, column } -- "\r\n" lookahead
	_ -> Position { line: line + 1, column: 1 } -- "\r"
	9 -> Position { line, column: column + 8 - ((column - 1) `mod` 8) } -- "\t" Who says that one tab is 8 columns?
	_ -> Position { line, column: column + 1 }