snoyberg/xml

Space leak when executing multiple `Text.XML.Stream.Parse.parseBytes` conduits

mbid opened this issue · 0 comments

mbid commented

I've encountered a weird space leak using Text.XML.Stream.Parse. I believe a reasonably minimal example is this:

doTwice :: Applicative f => f () -> f ()
doTwice x = x *> x

leakSpace :: IO ()
leakSpace =
  runResourceT $ runConduit $
  doTwice (sourceFile "large-file.xml" .| Text.XML.Stream.parseBytes def) .|
  sinkNull

If large-file.xml is large enough, this crashes with OOM, even before the second iteration over the file.

If the line

  doTwice (sourceFile "large-file.xml" .| Text.XML.Stream.parseBytes def) .|

is replaced with either

  sourceFile "large-file.xml" .| Text.XML.Stream.parseBytes def .|

(i.e. only parsing the file once) or

  doTwice (sourceFile "large-file.xml.gz" .| ungzip) .|

(i.e. not parsing at all, just connecting sourceFile to something else), everything works as expected. This makes me think that the cause of the issue is somewhere in parseBytes.

I've encountered this issue when trying to combine multiple conduits with for_, i.e. something like

for_ files $ \file -> sourceFile file .|  Text.XML.Stream.parseBytes .| ...

and then I triggered the issue even when files was a singleton list. I've not been able to reproduce this with constant singleton lists though, perhaps because of optimization. If files was a dynamic Maybe instead, the issue did not occure.

I'm using stackage's lts-12.5, i.e. xml-conduit-1.8.0.