Which steps does it require to transform the output of Zlib.compress() into a message that can be decoded with zcat ?
foretspaisibles opened this issue · 3 comments
I used the following program to zlib-compress and base64-encode the message HELLO
but I am not able to decompress the generated output with an external tool, such as gzip:
(* Toplevel interaction *)
#use "topfind";;
#require "cryptokit";;
open Cryptokit;;
# let c () = compose (Zlib.compress()) (Base64.encode_compact()) ;;
val c : unit -> Cryptokit.transform = <fun>
# transform_string (c()) "HELLO";;
- : string = "83D18fEHAA"
Trying to decode it with the gzip program yields the following error:
% printf '83D18fEHAA' | base64 --decode | zcat
zcat: unknown compression format
Further investigations however reveal that the output of the ocaml program above is contained in the output of gzip when compressing the message HELLO
:
% printf '83D18fEHAA' | base64 --decode | hexdump -C
00000000 f3 70 f5 f1 f1 07 |.p....|
00000006
% printf 'HELLO' | gzip | hexdump -C
00000000 1f 8b 08 00 94 e1 9a 58 00 03 f3 70 f5 f1 f1 07 |.......X...p....|
00000010 00 36 64 44 c1 05 00 00 00 |.6dD.....|
00000019
(The output of the ocaml program can be found at offset 0x0a.)
Which steps does it require to transform the output of Zlib.compress()
into a message that can be decoded with zcat ?
This is some code I have used to create a gzip-friendly output for cohttp. If you String.concat
the final list of bytes or otherwise stream them out to your target then you should get a gzip
/zcat
-friendly result.
This is using cryptokit for compression, camlzip for the CRC32 calculations and ocplib-endian for encoding the checksum and content length into raw bytes. I don't know what happens if raw
is larger than Int32.t
can express.
let to_gzip_body raw =
let length = String.length raw in
let crc32 = Zlib.update_crc_string 0l raw 0 length in
let int32_to_bytestring i =
let buf = Bytes.create 4 in
EndianString.LittleEndian.set_int32 buf 0 i;
Bytes.to_string buf
in
let t = Cryptokit.Zlib.compress () in
let compressed = Cryptokit.transform_string t raw in
(* XXX: Hard-coded gzip header is maybe not the best idea... *)
Cohttp_lwt_body.of_string_list [
"\x1f\x8b"; (* ID1 and ID2 *)
"\x08"; (* Compression method *)
"\x00"; (* Flags *)
"\x00\x00\x00\x00"; (* Time *)
"\x00"; (* Flags *)
"\xff"; (* OS *)
compressed; (* Compressed data *)
int32_to_bytestring crc32; (* CRC-32 checksum *)
int32_to_bytestring @@ Int32.of_int length; (* Original uncompressed size *)
]
Yes, gzip wraps the raw compressed data produced by Zlib with a header and a trailer.
If you're interested in producing gzip-compatible files from OCaml, another library of mine does exactly this: https://github.com/xavierleroy/camlzip/
The Zlib compression support in Cryptokit was intended for implementing crypto protocols that include compression as an (optional) step. Those protocols do not need the gzip header and trailer.
@hcarty, @xavierleroy Thank you very much for these clarifications!