DataFrame.load_csv!/2 seems to fail on certain options
chgeuer opened this issue · 2 comments
chgeuer commented
After downloading a CSV from the web, I want to load it into a DataFrame.
- Loading the CSV directly from memory using
DataFrame.load_csv!/2
results in a RuntimeError. - Storing the CSV binary in a temporary file, and then loading it using
DataFrame.from_csv!/2
works fine.
My assumption was that load_csv!
and from_csv!
should behave somewhat identical (except touching disk)
Below code with load_csv!/2
gives me
%RuntimeError{
message: "load_csv failed:
%RuntimeError{
message: "Polars Error: found more fields than defined in 'Schema'
Consider setting 'truncate_ragged_lines=true'.
Here's a quick repro for LiveBook:
Mix.install([
{:req, "~> 0.5.6"},
{:explorer, "~> 0.9.0"},
{:kino_explorer, "~> 0.1.21"},
{:iconv, "~> 1.0"}
])
require Explorer.DataFrame, as: DF
require Explorer.Series, as: S
url = "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/recent/stundenwerte_RR_01078_akt.zip"
%Req.Response{status: 200, body: zip} = Req.get!(url: url)
csv = zip |> Enum.into(%{}, fn {name, content} ->
{to_string(name), :iconv.convert("iso8859-15", "utf-8", content)}
end)
|> Enum.find(fn {name, _} -> String.starts_with?(name, "produkt") end)
|> elem(1)
csv_opts = [
header: true,
delimiter: ";",
infer_schema_length: 10,
nil_values: for(n <- 0..10, do: String.duplicate(" ", n) <> "-999"),
dtypes: [
{"STATIONS_ID", {:u, 16}},
{"MESS_DATUM", :string},
{"QN_8", {:u, 16}},
{" R1", {:f, 32}},
{"RS_IND", {:f, 32}},
{"WRTR", {:f, 32}}
],
]
try do
DF.load_csv!(csv, csv_opts)
rescue
err -> IO.puts("load_csv!/2 resulted in #{inspect err}")
end
File.write!("1.csv", csv)
df = DF.from_csv!("1.csv", csv_opts)
IO.puts("from_csv!/2 works well")
df |> DF.dtypes()
Maybe that's not a bug but a user error on my side. Is there a better place to ask that question?
josevalim commented
Which version are you using? We had a bug for this but it was fixed on 0.9.1, try forcing ~> 0.9.1 instead.
chgeuer commented
Thanks, @josevalim , that works perfectly in v0.9.1... Closing this issue