haskell/zlib

Decompressing concatenated file silently loses information

snoyberg opened this issue · 1 comments

I discovered this when researching a user bug report at snoyberg/conduit#254. Short description:

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Lazy as L
import Codec.Compression.GZip

main :: IO ()
main = do
    let lbs = L.concat
            [ compress "hello\n"
            , compress "world\n"
            ]
    print $ decompress lbs

This produces the output "hello\n", whereas expected behavior would be either "hello\nworld\n" or throwing an exception (though the latter would be suboptimal).

In the conduit issue, I argued against changing the behavior of conduit-extra's ungzip function to avoid breaking backwards compatibility. However, the arguments do not apply in the case of the zlib package:

  • The decompress function silently loses data right now, with no way of retrieving that data
  • The conduit-extra ungzip function leaves the unconsumed data available on the stream, which has real-world use cases where it's useful

Thanks for the report. This behaviour was changed in zlib-0.6.1.0 in April 2015. There's good test coverage for this feature. The behaviour can be controlled by changing the decompressAllMembers field of the DecompressParams. It defaults to True. It's also possible to handle this explicitly using the incremental DecompressStream API.