Does `Arrow.write` have an upper limit for the number of columns?
simsurace opened this issue · 1 comments
simsurace commented
I could not find this documented:
using Arrow, DataFrames
df = DataFrame(("$i" => rand(1000) for i in 1:65536)...)
Arrow.write("out/df.arrow", df)
produces
julia> Arrow.write("data.arrow", df)
ERROR: MethodError: no method matching length(::Nothing)
Closest candidates are:
length(::Union{Base.KeySet, Base.ValueIterator}) at abstractdict.jl:58
length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at ~/.julia/juliaup/julia-1.8.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:172
length(::Union{Tables.AbstractColumns, Tables.AbstractRow}) at ~/.julia/packages/Tables/AcRIE/src/Tables.jl:180
...
Stacktrace:
[1] makeschema(b::Arrow.FlatBuffers.Builder, sch::Tables.Schema{nothing, nothing}, columns::Arrow.ToArrowTable)
@ Arrow ~/.julia/packages/Arrow/P0wVk/src/write.jl:393
[2] close(writer::Arrow.Writer{IOStream})
@ Arrow ~/.julia/packages/Arrow/P0wVk/src/write.jl:244
[3] open(::Arrow.var"#122#123"{DataFrame}, ::Type, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:file,), Tuple{Bool}}})
@ Base ./io.jl:386
[4] #write#121
@ ~/.julia/packages/Arrow/P0wVk/src/write.jl:57 [inlined]
[5] top-level scope
@ REPL[94]:1
caused by: MethodError: no method matching length(::Nothing)
Closest candidates are:
length(::Union{Base.KeySet, Base.ValueIterator}) at abstractdict.jl:58
length(::Union{LinearAlgebra.Adjoint{T, S}, LinearAlgebra.Transpose{T, S}} where {T, S}) at ~/.julia/juliaup/julia-1.8.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.8/LinearAlgebra/src/adjtrans.jl:172
length(::Union{Tables.AbstractColumns, Tables.AbstractRow}) at ~/.julia/packages/Tables/AcRIE/src/Tables.jl:180
...
Stacktrace:
[1] makeschema(b::Arrow.FlatBuffers.Builder, sch::Tables.Schema{nothing, nothing}, columns::Arrow.ToArrowTable)
@ Arrow ~/.julia/packages/Arrow/P0wVk/src/write.jl:393
[2] makeschemamsg(sch::Tables.Schema{nothing, nothing}, columns::Arrow.ToArrowTable)
@ Arrow ~/.julia/packages/Arrow/P0wVk/src/write.jl:430
[3] macro expansion
@ ~/.julia/packages/Arrow/P0wVk/src/write.jl:198 [inlined]
[4] macro expansion
@ ./task.jl:454 [inlined]
[5] write(writer::Arrow.Writer{IOStream}, source::DataFrame)
@ Arrow ~/.julia/packages/Arrow/P0wVk/src/write.jl:185
[6] (::Arrow.var"#122#123"{DataFrame})(writer::Arrow.Writer{IOStream})
@ Arrow ~/.julia/packages/Arrow/P0wVk/src/write.jl:58
[7] open(::Arrow.var"#122#123"{DataFrame}, ::Type, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:file,), Tuple{Bool}}})
@ Base ./io.jl:384
[8] #write#121
@ ~/.julia/packages/Arrow/P0wVk/src/write.jl:57 [inlined]
[9] top-level scope
@ REPL[94]:1
Whereas it works with 65535 columns.
Moelf commented
seems fine with pyarrow
In [1]: import pyarrow.feather, numpy as np, pandas as pd
In [3]: df = pd.DataFrame({f"col_{k}": np.random.rand(100) for k in range(65538)})
In [4]: pyarrow.feather.write_feather(df, "/tmp/wide.feather", compression="uncompressed")
In [6]: pyarrow.feather.read_table("/tmp/wide.feather")["col_65537"]
Out[6]:
<pyarrow.lib.ChunkedArray object at 0x7fd528293c90>
[
[
0.3791875035442084,
0.5547163201551565,
0.13564446518017992,
0.4183265184379561,
0.8100731859852923,
...
0.6820512183941593,
0.6142216465909046,
0.7692441575177542,
0.07715418533522123,
0.38896656434696375
]
]