Npz breaks when writing large output
alfa07 opened this issue · 2 comments
alfa07 commented
Platform: MacOS High Sierra 10.13.6
Python: 3.6
For small files < 100MB Npz seems to be working, but I am getting CRC errors from numpy when writing large files ~1GB
open Core
let mk_big_file name npz_file =
let open Bigarray in
let arr = Array2.create int8_signed c_layout 10_000_000 2_048 in
let npz = Npy.Npz.open_out npz_file in
Exn.protectx npz ~finally:Npy.Npz.close_out ~f:(fun npz ->
let big_arr = arr |> Bigarray.genarray_of_array2 in
let () = Npy.Npz.write npz name big_arr in
())
let () =
mk_big_file "a" "a.npz"
Then in ipython
a = np.load('a.npz')
a['a']
And you get:
~/miniconda3/lib/python3.6/zipfile.py in _update_crc(self, newdata)
865 # Check the CRC if we're at the end of the file
866 if self._eof and self._running_crc != self._expected_crc:
--> 867 raise BadZipFile("Bad CRC-32 for file %r" % self.name)
868
869 def read1(self, n):
BadZipFile: Bad CRC-32 for file 'a.npy'
alfa07 commented
It seems ocamlzip does not support Zip64 format so it writes wrong header on inputs larger than 4GB (e.g. uncompressed_size).
alfa07 commented