CSV.jl fails to parse a file that DuckDB is fine with
Opened this issue · 1 comments
asinghvi17 commented
MWE:
import CSV, QuackIO
using DataFrames
file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv")
# try QuackIO first
dataset = QuackIO.read_csv(DataFrame, file) # works
# now try CSV
CSV.read(file, DataFrame) # errors
The error:
ERROR: TaskFailedException
nested task error: thread = 7 fatal error, encountered an invalidly quoted field while parsing around row = 4573, col = 12: ""03.10.2018: 2330 UTC: Posn: 38:49.2N – 118:14.5E, Tianjin Anchorage, China.
", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
Stacktrace:
[1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:590
[2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:798
[3] parserow
@ ~/.julia/packages/CSV/cwX2w/src/file.jl:640 [inlined]
[4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:550
[5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:360
[6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
@ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
[1] sync_end(c::Channel{Any})
@ Base ./task.jl:455
[2] macro expansion
@ ./task.jl:487 [inlined]
[3] CSV.File(ctx::CSV.Context, chunking::Bool)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:240
[4] File
@ ~/.julia/packages/CSV/cwX2w/src/file.jl:227 [inlined]
[5] #File#32
@ ~/.julia/packages/CSV/cwX2w/src/file.jl:223 [inlined]
[6] CSV.File(source::String)
@ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:162
[7] read(source::String, sink::Type; copycols::Bool, kwargs::@Kwargs{})
@ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:117
[8] read(source::String, sink::Type)
@ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:113
[9] top-level scope
@ REPL[223]:1
Some type information was truncated. Use `show(err)` to see complete types.
I tried tracking down the error, but everything in that area of the file (both the line mentioned and searching for the given text) seemed fine...
AmeroIL commented
Hi @asinghvi17
I ran the code above and did not find any errors, the final output was the DF itself.
perhaps this is an issue which is related to the installation of Julia?