JuliaData/CSV.jl

CSV.jl fails to parse a file that DuckDB is fine with

Opened this issue · 1 comments

MWE:

import CSV, QuackIO
using DataFrames

file = download("https://raw.githubusercontent.com/newzealandpaul/Maritime-Pirate-Attacks/refs/heads/main/data/csv/pirate_attacks.csv")

# try QuackIO first
dataset = QuackIO.read_csv(DataFrame, file) # works

# now try CSV
CSV.read(file, DataFrame) # errors

The error:

ERROR: TaskFailedException

    nested task error: thread = 7 fatal error, encountered an invalidly quoted field while parsing around row = 4573, col = 12: ""03.10.2018: 2330 UTC: Posn: 38:49.2N – 118:14.5E, Tianjin Anchorage, China.
    ", error=INVALID: OK | QUOTED | EOF | INVALID_QUOTED_FIELD , check your `quotechar` arguments or manually fix the field in the file itself
    
    Stacktrace:
     [1] fatalerror(buf::Vector{UInt8}, pos::Int64, len::Int64, code::Int16, row::Int64, col::Int64)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:590
     [2] parsevalue!(::Type{…}, buf::Vector{…}, pos::Int64, len::Int64, row::Int64, rowoffset::Int64, i::Int64, col::CSV.Column, ctx::CSV.Context)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:798
     [3] parserow
       @ ~/.julia/packages/CSV/cwX2w/src/file.jl:640 [inlined]
     [4] parsefilechunk!(ctx::CSV.Context, pos::Int64, len::Int64, rowsguess::Int64, rowoffset::Int64, columns::Vector{…}, ::Type{…})
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:550
     [5] multithreadparse(ctx::CSV.Context, pertaskcolumns::Vector{…}, rowchunkguess::Int64, i::Int64, rows::Vector{…}, wholecolumnslock::ReentrantLock)
       @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:360
     [6] (::CSV.var"#34#39"{CSV.Context, Vector{Vector{CSV.Column}}, Int64, Int64, Vector{Int64}, ReentrantLock})()
       @ CSV ~/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base ./task.jl:455
 [2] macro expansion
   @ ./task.jl:487 [inlined]
 [3] CSV.File(ctx::CSV.Context, chunking::Bool)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:240
 [4] File
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:227 [inlined]
 [5] #File#32
   @ ~/.julia/packages/CSV/cwX2w/src/file.jl:223 [inlined]
 [6] CSV.File(source::String)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/file.jl:162
 [7] read(source::String, sink::Type; copycols::Bool, kwargs::@Kwargs{})
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:117
 [8] read(source::String, sink::Type)
   @ CSV ~/.julia/packages/CSV/cwX2w/src/CSV.jl:113
 [9] top-level scope
   @ REPL[223]:1
Some type information was truncated. Use `show(err)` to see complete types.

I tried tracking down the error, but everything in that area of the file (both the line mentioned and searching for the given text) seemed fine...

Hi @asinghvi17

I ran the code above and did not find any errors, the final output was the DF itself.
perhaps this is an issue which is related to the installation of Julia?