Notes on performance
jamesdbrock opened this issue · 9 comments
This is what the benchmarks currently look like:
Text.Parsing.StringParser.CodeUnits
StringParser.runParser parse23Units
mean = 10.10 ms
stddev = 1.13 ms
min = 9.46 ms
max = 24.07 ms
Text.Parsing.Parser.String
runParser parse23
mean = 44.20 ms
stddev = 6.38 ms
min = 42.25 ms
max = 113.16 ms
Data.String.Regex
Regex.match pattern23
mean = 728.23 μs
stddev = 339.32 μs
min = 613.72 μs
max = 2.97 ms
I would like to reduce that 4× slowness between Parser.String and StringParser.CodeUnits .
The difference could be due to:
CodePointrather thanChar. Everything goes through theanyCodePointparser since #119 , but I benchmarked it at the time and it didn't make any difference.- String tail state. Every time the parser advances by one character, we
unconsthe input string and save the tail as the new state. I tried changing that to only keeping a codeunit index into the input string on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor - Parsing.Parser.String input position tracking with
Pos { line :: Int, column :: Int}. I tried changing that toPos Inton this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor - Monad transformers. When I look at the benchmark profiling, it looks like most of the time is spent in
Control.Monad.State.Trans.bindandText.Parsing.Parser.Combinators.tryRethrow. So this might be the entire problem, but improving this won't be easy.
Thread about Tail Call Elimination
https://discord.com/channels/864614189094928394/936908261200912394/936947552874545202
Would refactoring the ParserT to use continuation-passing-style internally instead of the ExceptT StateT transformer stack improve the speed?
https://discord.com/channels/864614189094928394/938814730439655535
Hi @natefaubion , why is https://github.com/natefaubion/purescript-language-cst-parser/blob/main/src/PureScript/CST/Parser/Monad.purs written this way? Would it make sense to use your techniques for purescript-parsing?
Because I needed a faster, stack-safe parser. If you are writing a transformer, the techniques do not make sense (or rather, are unnecessary). For transformers, stack safety is determined by the base Monad in your transformer stack. CPS is OK for a transformer, or at least, you aren't losing anything by using CPS in a transformer, since you are paying an abstraction tax regardless.
In general, though CPS does not necessarily mean "faster", because JS runtimes don't exploit CPS. CPS is great in functional runtimes that have tail call elimination, where dynamic tail calls turn into jumps. CPS encodings of direct-style code usually means trading allocation of data for allocation of closures, which can be faster as it's often easier for an optimizing compiler to remove the closure allocations.
Note that I recently revamped language-cst internals to be a bit faster and simpler by switching to CPS with uncurried functions and a trampoline eliminator. For ParserT, it might look like:
newtype ParserT s m a = ParserT
( forall r
. Fn5
(ParserState s) -- State argument
(m (Unit -> r) -> r) -- Trampoline/lift
(Fn2 (ParserState s) ParseError r) -- Fail
(Fn2 (ParserState s) a r) -- Succeed
r
)I think the would also give you the flexibility to run your parser in any MonadRec without having to write your parser in terms of MonadRec. I'd expect that this would help close the performance gap significantly (and provide stack safety by default).
Thanks @natefaubion that is extremely helpful.
Link #154
Some other notes on string internals in particular:
stringis potentially slow on failure because of theshowcall to the input string. Failure cases are hit very often, and if you are usingstringa lot this can be significant overhead. Deferred errors might be better in general.satisfycombinators are implemented in terms ofanycombinators with an esoterictryRethrowcombinator. If you implementanyin terms ofsatisfy (const true)then you can remove a lot of overhead for the satisfy calls.
choice should use foldr (really <|> should be right associative).
choiceshould usefoldr(really<|>should be right associative).
This was done by @natefaubion in #154