Type instability in getcolumn
baumgold opened this issue · 0 comments
baumgold commented
In Arrow.Table
all columns are stored in a Vector{AbstractVector}
. This causes downstream type instability problems and performance problems when iterating over a single column.
julia> using Arrow, Tables
julia> buf = Arrow.tobuffer((a=[1,2,3], b=[4,5,6]));
julia> tt = Arrow.Table(buf)
Arrow.Table with 3 rows, 2 columns, and schema:
:a Int64
:b Int64
julia> @code_warntype Tables.getcolumn(tt, :a)
MethodInstance for Tables.getcolumn(::Arrow.Table, ::Symbol)
from getcolumn(t::Arrow.Table, nm::Symbol) @ Arrow ~/.julia/packages/Arrow/ID4np/src/table.jl:369
Arguments
#self#::Core.Const(Tables.getcolumn)
t::Arrow.Table
nm::Symbol
Body::AbstractVector
1 ─ %1 = Arrow.lookup(t)::Dict{Symbol, AbstractVector}
│ %2 = Base.getindex(%1, nm)::AbstractVector
└── return %2
This uses Julia v1.10 and Arrow v2.7.1.
julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 48 × Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 5 on 48 virtual cores
Environment:
JULIA_NUM_THREADS = 4