apache/arrow-julia

Bus errors when writing `DataFrame`

simsurace opened this issue · 8 comments

I intermittently get bus errors with crash to terminal when writing DataFrames to .arrow:

julia> Arrow.write("df.arrow", df)
[8796] signal (7.2): Bus error
in expression starting at REPL[243]:1
getindex at ./essentials.jl:13 [inlined]
getindex at ~/.julia/packages/Arrow/R2Rvz/src/arraytypes/primitive.jl:48 [inlined]
getindex at ~/.julia/packages/ArrowTypes/Nb4EC/src/ArrowTypes.jl:412 [inlined]
iterate at ./abstractarray.jl:1220 [inlined]
iterate at ./abstractarray.jl:1218 [inlined]
writearray at ~/.julia/packages/Arrow/R2Rvz/src/utils.jl:49
writebuffer at ~/.julia/packages/Arrow/R2Rvz/src/arraytypes/primitive.jl:102
unknown function (ip: 0x7ff9d489d0d0)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
write at ~/.julia/packages/Arrow/R2Rvz/src/write.jl:363
macro expansion at ~/.julia/packages/Arrow/R2Rvz/src/write.jl:151 [inlined]
#124 at ./threadingconstructs.jl:373
unknown function (ip: 0x7ff9d487aa9f)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
start_task at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/task.c:1092
Allocations: 16091661486 (Pool: 15611643534; Big: 480017952); GC: 116806
Bus error (core dumped)

This was with Julia 1.9.1 and Arrow.jl v2.6.2

Further details, as shared on Slack already:
The DataFrame was 84x13075, a single Date column and 13074 Float64 columns.

Moelf commented

13074 Float64 columns.

that's pretty extreme, why isn't the table transposed? otherwise I don't know what could Bus error indicate, Arrow.jl is not super friendly with memory but this doesn't seem to be OOM like?

The table isn't transposed because in general the columns are more heterogeneous. But this should be well below any hard limits, I've even written tables with 100000 columns before without any problems.

signal (7): Bus error
in expression starting at REPL[1]:1
getindex at ./array.jl:924 [inlined]
getindex at /home/snthomas/.julia/packages/Arrow/R2Rvz/src/arraytypes/primitive.jl:48 [inlined]
getindex at ./subarray.jl:315 [inlined]
iterate at /home/snthomas/.julia/packages/Arrow/R2Rvz/src/arraytypes/list.jl:174 [inlined]
writearray at /home/snthomas/.julia/packages/Arrow/R2Rvz/src/utils.jl:49
writebuffer at /home/snthomas/.julia/packages/Arrow/R2Rvz/src/arraytypes/primitive.jl:102
writebuffer at /home/snthomas/.julia/packages/Arrow/R2Rvz/src/arraytypes/map.jl:118
Allocations: 45171119 (Pool: 45154695; Big: 16424); GC: 36

I am getting a similar error when writing a TypedTable. This TypedTable is only 10x23. I got this error from running Arrow.write(datafile, data).

The result is a zero byte file at datafile.

Arrow version 2.6.2, Julia version 1.8.5.

Moelf commented

is it possible to provide the schema of the 10x23 table? or better can you write a snippet reproducer to generate dummy data?

I think I resolved the issue. It has to do with how Arrow reads Tables from disk. It does not load the entire table into memory but uses only a view. If you write this view to the same file, it causes this bus error.

using TypedTables
using Arrow

tab = Table(
    a=[i for i=1:10],
    b=[fill(0.1,10) for i=1:10]
)
filename = "test.arrow"
Arrow.write(filename, tab)

newtab = Table(Arrow.Table(filename))
Arrow.write(filename, deepcopy(newtab)) # always works
Arrow.write(filename, copy(newtab)) # only works for simple columns like :a, but not :b
Arrow.write(filename, newtab) # always fails

I wonder if this is also the case in @simsurace's error.

I don't believe that was the case. I was writing some intermediate results from a large calculation to a new file name, and the error occurred intermittently.

Can confirm the same issue that @stuartthomas25 identified. My quite obvious workaround is to write the table to a temporary file and then overwrite the original. However, julia mv and undocumented Base.rename aren't guaranteed to be atomic so I wonder what happens to existing views as the file is overwritten.