philopon/pugixml-hs

Can segfault

ndmitchell opened this issue · 4 comments

The library can segfault. Pugi requires the document to be kept alive while any nodes are being used, but the Haskell binding doesn't ensure that.

Example code:

segfault :: IO ()
segfault = forever $ do
    bs <- BS.readFile "example.xml"
    let Right root = parse def bs
    let unattend = fromJust $ child "unattend" root
    print $ show unattend
    E.evaluate bs

Are there any workarounds to this?

I'd prefer a more permanent fix but couldn't determine at first glance what might need to be done. Are there any hints on what such a fix would look like?

Being a bit dense above - easy to work around by keeping the doc node in scope until I've finished with it! Annoying but it works for now...

I'll also have a read through of the pugixml docs to see where / how I can set up the FFI finalizers to keep the doc around until after all the nodes have been used. Is anyone aware of any other FFI bindings that do a similar thing?

Pugixml deliberately doesn't require the nodes to be kept alive - they a single pointers that reference inside the original doc. The real solution is not to keep alive the memory that Pugixml-hs allocated, but to skip the memory allocation in the first place. As for an example of a library that follows that pattern, Hexml is the perfect example. see here, where each node points at the central document.

I was working on a patch to do the above, but then discovered even with those changes the underlying Pugixml was too slow, so went off and developed Hexml.