`Arrow.write()` cannot handle large `Mmap`-ed table
Moelf opened this issue · 0 comments
Moelf commented
Premise:
- input file is large, uncompressed Arrow file
- we produce a mask and produce a
view()
over theMmap
-ed table - use
Arrow.write()
to write filtered table to disk
This seems to take increasing memory as the content of the mask
.
I know this doesn't work correctly because if I set memory limit first:
> ulimit -Sv 8000000
julia> using Arrow, DataFrames
julia> const df = @time DataFrame(Arrow.Table("./nanoAOD_nocomp.feather"); copycols=false);
2.685720 seconds (4.84 M allocations: 321.109 MiB, 4.41% gc time, 100.89% compilation time)
julia> Arrow.write("/home/akako/Downloads/out.feather", @view df[1:1*10^4, :]);
julia> Arrow.write("/home/akako/Downloads/out.feather", @view df[1:2*10^4, :]);
ERROR: Internal error: encountered unexpected error in runtime:
OutOfMemoryError()
unknown function (ip: 0x7f9d7329fc99)
unknown function (ip: 0x7f9d732935b5)
jl_gc_alloc at /home/akako/Documents/github/dotFiles/homedir/.julia/juliaup/julia-1.9.0-rc1+0.x64.linux.gnu/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
ijl_alloc_array_1d at /home/akako/Documents/github/dotFiles/homedir/.julia/juliaup/julia-1.9.0-rc1+0.x64.linux.gnu/bin/../lib/julia/libjulia-internal.so.1 (unknown line)
unknown function (ip: 0x7f9d5ec6259e)
unknown function (ip: 0x7f9d5e36c9dc)
unknown function (ip: 0x7f9d5e73ee1f)
unknown function (ip: 0x7f9d5e73ed98)