chainl space leak
noughtmare opened this issue · 2 comments
noughtmare commented
The definition of chainl is:
chainl :: (b -> a -> b) -> ParserT st e b -> ParserT st e a -> ParserT st e b
chainl f start elem = start >>= go where
go b = do {!a <- elem; go $! f b a} <|> pure b
But that use of <|>
means we push a new frame to the stack for every iteration and that frame can keep references to a large amount of heap data.
I ran into this issue when implementing a parser for the "1 billion row challenge" (see this discourse post). I had an implementation like this:
parseMeasurements :: Parser () (M.HashMap BS.ByteString T)
parseMeasurements = chainl (\ !m (S k v) -> M.insertWith f k (T v v v 1) m) (pure M.empty) parseLine where
f (T a b c d) (T a' b' c' d') = T (min a a') (max b b') (c+c') (d+d')
Which caused a major space leak slowing down my program from 1 second to over 12 seconds (with only 9.2% productivity) on an input of 10 million rows.
I've fixed it by rewriting my implementation to use a manual recursion:
parseMeasurements :: Parser () (M.HashMap BS.ByteString T)
parseMeasurements = go M.empty where
go !m = withOption parseLine (\(S k v) -> go (M.insertWith f k (T v v v 1) m)) (pure m)
f (T a b c d) (T a' b' c' d') = T (min a a') (max b b') (c+c') (d+d')
So I'd suggest changing the definition of chainl
similarly, e.g.:
chainl :: (b -> a -> b) -> ParserT st e b -> ParserT st e a -> ParserT st e b
chainl f start elem = start >>= go where
go b = withOption elem (\ !a -> go $! f b a) (pure b)
AndrasKovacs commented
Thanks! You're totally right, I'll fix shortly.
AndrasKovacs commented
Fixed by 851a39c.