Notes on performance

Question

Notes on performance

jamesdbrock opened this issue 4 years ago · 9 comments

This is what the benchmarks currently look like:

Text.Parsing.StringParser.CodeUnits

StringParser.runParser parse23Units
mean   = 10.10 ms
stddev = 1.13 ms
min    = 9.46 ms
max    = 24.07 ms

Text.Parsing.Parser.String

runParser parse23
mean   = 44.20 ms
stddev = 6.38 ms
min    = 42.25 ms
max    = 113.16 ms

Data.String.Regex

Regex.match pattern23
mean   = 728.23 μs
stddev = 339.32 μs
min    = 613.72 μs
max    = 2.97 ms

I would like to reduce that 4× slowness between Parser.String and StringParser.CodeUnits .

The difference could be due to:

CodePoint rather than Char. Everything goes through the anyCodePoint parser since #119 , but I benchmarked it at the time and it didn't make any difference.
String tail state. Every time the parser advances by one character, we uncons the input string and save the tail as the new state. I tried changing that to only keeping a codeunit index into the input string on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
Parsing.Parser.String input position tracking with Pos { line :: Int, column :: Int}. I tried changing that to Pos Int on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
Monad transformers. When I look at the benchmark profiling, it looks like most of the time is spent in Control.Monad.State.Trans.bind and Text.Parsing.Parser.Combinators.tryRethrow. So this might be the entire problem, but improving this won't be easy.

natefaubion commented 4 years ago

Link #154

Answer 1 · 2022-02-01T09:23:15.000Z

Thread about Tail Call Elimination
https://discord.com/channels/864614189094928394/936908261200912394/936947552874545202

Answer 2 · 2022-02-03T09:31:13.000Z

Would refactoring the ParserT to use continuation-passing-style internally instead of the ExceptT StateT transformer stack improve the speed?

Answer 3 · 2022-02-03T23:24:36.000Z

https://discord.com/channels/864614189094928394/938814730439655535

Hi @natefaubion , why is https://github.com/natefaubion/purescript-language-cst-parser/blob/main/src/PureScript/CST/Parser/Monad.purs written this way? Would it make sense to use your techniques for purescript-parsing?

Because I needed a faster, stack-safe parser. If you are writing a transformer, the techniques do not make sense (or rather, are unnecessary). For transformers, stack safety is determined by the base Monad in your transformer stack. CPS is OK for a transformer, or at least, you aren't losing anything by using CPS in a transformer, since you are paying an abstraction tax regardless.

In general, though CPS does not necessarily mean "faster", because JS runtimes don't exploit CPS. CPS is great in functional runtimes that have tail call elimination, where dynamic tail calls turn into jumps. CPS encodings of direct-style code usually means trading allocation of data for allocation of closures, which can be faster as it's often easier for an optimizing compiler to remove the closure allocations.

Answer 4 · 2022-03-14T18:43:08.000Z

Note that I recently revamped language-cst internals to be a bit faster and simpler by switching to CPS with uncurried functions and a trampoline eliminator. For ParserT, it might look like:

newtype ParserT s m a = ParserT
  ( forall r
      . Fn5
          (ParserState s) -- State argument
          (m (Unit -> r) -> r) -- Trampoline/lift
          (Fn2 (ParserState s) ParseError r) -- Fail
          (Fn2 (ParserState s) a r) -- Succeed
          r
  )

I think the would also give you the flexibility to run your parser in any MonadRec without having to write your parser in terms of MonadRec. I'd expect that this would help close the performance gap significantly (and provide stack safety by default).

Answer 5 · 2022-03-14T22:33:47.000Z

Thanks @natefaubion that is extremely helpful.

Answer 6 · 2022-03-20T00:32:17.000Z

Some other notes on string internals in particular:

string is potentially slow on failure because of the show call to the input string. Failure cases are hit very often, and if you are using string a lot this can be significant overhead. Deferred errors might be better in general.
satisfy combinators are implemented in terms of any combinators with an esoteric tryRethrow combinator. If you implement any in terms of satisfy (const true) then you can remove a lot of overhead for the satisfy calls.

Answer 7 · 2022-03-20T00:38:17.000Z

choice should use foldr (really <|> should be right associative).

purescript/purescript-control#79

Answer 8 · 2022-03-22T09:56:14.000Z

choice should use foldr (really <|> should be right associative).

This was done by @natefaubion in #154