xavierleroy/cryptokit

Which steps does it require to transform the output of Zlib.compress() into a message that can be decoded with zcat ?

foretspaisibles opened this issue · 3 comments

I used the following program to zlib-compress and base64-encode the message HELLO but I am not able to decompress the generated output with an external tool, such as gzip:

(* Toplevel interaction *)
#use "topfind";;
#require "cryptokit";;
open Cryptokit;;
# let c () = compose (Zlib.compress()) (Base64.encode_compact()) ;;
val c : unit -> Cryptokit.transform = <fun>
# transform_string (c()) "HELLO";;
- : string = "83D18fEHAA"

Trying to decode it with the gzip program yields the following error:

% printf '83D18fEHAA'  | base64 --decode | zcat 
zcat: unknown compression format

Further investigations however reveal that the output of the ocaml program above is contained in the output of gzip when compressing the message HELLO:

% printf '83D18fEHAA' | base64 --decode | hexdump -C
00000000  f3 70 f5 f1 f1 07                                 |.p....|
00000006
% printf 'HELLO' | gzip | hexdump -C
00000000  1f 8b 08 00 94 e1 9a 58  00 03 f3 70 f5 f1 f1 07  |.......X...p....|
00000010  00 36 64 44 c1 05 00 00  00                       |.6dD.....|
00000019

(The output of the ocaml program can be found at offset 0x0a.)

Which steps does it require to transform the output of Zlib.compress() into a message that can be decoded with zcat ?

This is some code I have used to create a gzip-friendly output for cohttp. If you String.concat the final list of bytes or otherwise stream them out to your target then you should get a gzip/zcat-friendly result.

This is using cryptokit for compression, camlzip for the CRC32 calculations and ocplib-endian for encoding the checksum and content length into raw bytes. I don't know what happens if raw is larger than Int32.t can express.

let to_gzip_body raw =
  let length = String.length raw in
  let crc32 = Zlib.update_crc_string 0l raw 0 length in
  let int32_to_bytestring i =
    let buf = Bytes.create 4 in
    EndianString.LittleEndian.set_int32 buf 0 i;
    Bytes.to_string buf
  in
  let t = Cryptokit.Zlib.compress () in
  let compressed = Cryptokit.transform_string t raw in
  (* XXX: Hard-coded gzip header is maybe not the best idea... *)
  Cohttp_lwt_body.of_string_list [
    "\x1f\x8b"; (* ID1 and ID2 *)
    "\x08"; (* Compression method *)
    "\x00"; (* Flags *)
    "\x00\x00\x00\x00"; (* Time *)
    "\x00"; (* Flags *)
    "\xff"; (* OS *)
    compressed; (* Compressed data *)
    int32_to_bytestring crc32; (* CRC-32 checksum *)
    int32_to_bytestring @@ Int32.of_int length; (* Original uncompressed size *)
  ]

Yes, gzip wraps the raw compressed data produced by Zlib with a header and a trailer.

If you're interested in producing gzip-compatible files from OCaml, another library of mine does exactly this: https://github.com/xavierleroy/camlzip/

The Zlib compression support in Cryptokit was intended for implementing crypto protocols that include compression as an (optional) step. Those protocols do not need the gzip header and trailer.

@hcarty, @xavierleroy Thank you very much for these clarifications!