purescript-contrib/purescript-parsing

CodePoints uncons? Deprecate drop?

Closed this issue · 8 comments

I just noticed that instance stringStringLike uses CodeUnits for uncons.

instance stringLikeString :: StringLike String where
uncons = SCU.uncons
drop = S.drop

Doesn't that mean that anyChar will be wrong for astral characters?

anyChar = do
input <- gets \(ParseState input _ _) -> input
case uncons input of

Also, the drop member of StringLike is now unused?

I just hit the anyChar issue you found above, but with the purerl backend.

I'm looking through the code for this package and basically the whole thing is completely oriented to UTF-16 codeunits...

The problem is that https://pursuit.purescript.org/packages/purescript-string-parsers/6.0.1/docs/Text.Parsing.StringParser.CodePoints#v:anyChar puts the result into a Char so it isn't parsing code points after all. we'll be writing our own anyCodePoint and anyGrapheme later today.

@drathier I'd like to see your anyGrapheme parser. Link please? Do you think that this library should include anyGrapheme?

@drathier Would PR #119 solve your problem? That PR deletes the StringLike class. Would you like to rely on that class for purerl? Or have your implemented the Data.String API for purerl?