Decompressing concatenated file silently loses information
snoyberg opened this issue · 1 comments
I discovered this when researching a user bug report at snoyberg/conduit#254. Short description:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Lazy as L
import Codec.Compression.GZip
main :: IO ()
main = do
let lbs = L.concat
[ compress "hello\n"
, compress "world\n"
]
print $ decompress lbs
This produces the output "hello\n"
, whereas expected behavior would be either "hello\nworld\n"
or throwing an exception (though the latter would be suboptimal).
In the conduit issue, I argued against changing the behavior of conduit-extra's ungzip function to avoid breaking backwards compatibility. However, the arguments do not apply in the case of the zlib package:
- The
decompress
function silently loses data right now, with no way of retrieving that data - The conduit-extra ungzip function leaves the unconsumed data available on the stream, which has real-world use cases where it's useful
Thanks for the report. This behaviour was changed in zlib-0.6.1.0 in April 2015. There's good test coverage for this feature. The behaviour can be controlled by changing the decompressAllMembers
field of the DecompressParams
. It defaults to True
. It's also possible to handle this explicitly using the incremental DecompressStream
API.