m4rw3r/chomp

Attoparsec parsers

m4rw3r opened this issue · 6 comments

Attoparsec has a lot of good parsers and combinators, would be a good idea to implement most if not all of them.

Data.Attoparsec.ByteString

Individual bytes

  • word8 -> token

  • anyWord8 -> any

  • notWord8 -> not_token

  • satisfy

  • satisfyWith -> satisfy_with

  • skip

    satisfy optimizes into skip.

Lookahead

Byte classes

  • inClass
  • notInClass

Efficient string handling

  • string

  • skipWhile -> skip_while

    takeWhile optimizes into skipWhile for simple Input types like slices.

  • take

  • scan

  • runScanner -> run_scanner

  • takeWhile -> take_while

  • takeWhile1 -> take_while1

  • takeTill -> take_till

Consume all remaining input

  • takeByteString -> take_remainder

Combinators

  • try

    Redundant since Chomp backtracks automatically on combinators requiring backtracking.

  • <?>

    Redundant since map_err exists.

  • choice

  • count

  • option

  • many

  • many1

  • manyTill -> many_till

  • sepBy -> sep_by

  • sepBy1 -> sep_by1

  • skipMany -> skip_many

  • skipMany1 -> skip_many1

  • eitherP -> either

  • match -> matched_by

State observation

  • endOfInput -> eof

Data.Attoparsec.ByteString.Char8

Special character parsers

Fast predicates

  • isDigit -> ascii::is_digit
  • isAlpha_iso8859_15
  • isAlpha_ascii -> ascii::is_alpha
  • isSpace -> ascii::is_whitespace
  • isHorizontalSpace -> ascii::is_horizontal_space
  • isEndOfLine -> ascii::is_end_of_line

Efficient string handling

Numeric parsers

Data.Attoparsec.Combinator

Data.Attoparsec.Text

I have a somewhat a need for choice. Since variadic functions aren't possible in rust (yet), I'm wondering what the performance implications are when passing a slice of parsers. OTOH, macros may be utilized to sugar nested or functions.

@dashed What kind of need? The reason I have not yet implemented choice is that it I am unsure if it should accept a list of function poiinters or a list of closures. The list-of-function-pointers is different in that it only has one level of indirection from the original slice compared to two of the closures. The first one does not need to box anything but for closures you need to since they are dynamically sized.

As for using or, there is already a sugar for this in the form of the <|> operator in the parse! macro. This is most likely the best solution if you have a static list of branches for the parser.

I recently discovered <|> operator which seems to make things a bit nicer.

@m4rw3r Do you know if there's a better way to do skip_many_till? Essentially many_till that doesn't return.

@dashed To properly make it it would require some additional methods on the internal trait for the bounded combinators. But there is an easy way by implementing a sink implementing FromIterator which will just discard all the data.

@m4rw3r Thanks for the suggestion! I'll try to investigate this approach.