LaurentMazare/npy-ocaml

Npz breaks when writing large output

alfa07 opened this issue · 2 comments

Platform: MacOS High Sierra 10.13.6
Python: 3.6
For small files < 100MB Npz seems to be working, but I am getting CRC errors from numpy when writing large files ~1GB

open Core

let mk_big_file name npz_file =
  let open Bigarray in
  let arr = Array2.create int8_signed c_layout 10_000_000 2_048 in
  let npz = Npy.Npz.open_out npz_file in
  Exn.protectx npz ~finally:Npy.Npz.close_out ~f:(fun npz ->
      let big_arr = arr |> Bigarray.genarray_of_array2 in
      let () = Npy.Npz.write npz name big_arr in
      ())

let () =
  mk_big_file "a" "a.npz"

Then in ipython

 a = np.load('a.npz')
 a['a']

And you get:

~/miniconda3/lib/python3.6/zipfile.py in _update_crc(self, newdata)
    865         # Check the CRC if we're at the end of the file
    866         if self._eof and self._running_crc != self._expected_crc:
--> 867             raise BadZipFile("Bad CRC-32 for file %r" % self.name)
    868
    869     def read1(self, n):

BadZipFile: Bad CRC-32 for file 'a.npy'

It seems ocamlzip does not support Zip64 format so it writes wrong header on inputs larger than 4GB (e.g. uncompressed_size).