apache/arrow-julia

`Arrow.write()` cannot handle large `Mmap`-ed table

Moelf opened this issue · 0 comments

Moelf commented

Premise:

  1. input file is large, uncompressed Arrow file
  2. we produce a mask and produce a view() over the Mmap-ed table
  3. use Arrow.write() to write filtered table to disk

This seems to take increasing memory as the content of the mask.

I know this doesn't work correctly because if I set memory limit first:

> ulimit -Sv 8000000
julia> using Arrow, DataFrames

julia> const df = @time DataFrame(Arrow.Table("./nanoAOD_nocomp.feather"); copycols=false);
  2.685720 seconds (4.84 M allocations: 321.109 MiB, 4.41% gc time, 100.89% compilation time)

julia> Arrow.write("/home/akako/Downloads/out.feather", @view df[1:1*10^4, :]);

julia> Arrow.write("/home/akako/Downloads/out.feather", @view df[1:2*10^4, :]);
ERROR: Internal error: encountered unexpected error in runtime:
OutOfMemoryError()
unknown function (ip: 0x7f9d7329fc99)
unknown function (ip: 0x7f9d732935b5)
jl_gc_alloc at /home/akako/Documents/github/dotFiles/homedir/.julia/juliaup/julia-1.9.0-rc1+0.x64.linux.gnu/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
ijl_alloc_array_1d at /home/akako/Documents/github/dotFiles/homedir/.julia/juliaup/julia-1.9.0-rc1+0.x64.linux.gnu/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
unknown function (ip: 0x7f9d5ec6259e)
unknown function (ip: 0x7f9d5e36c9dc)
unknown function (ip: 0x7f9d5e73ee1f)
unknown function (ip: 0x7f9d5e73ed98)