MarkyMan4/filequery

Issue loading CSVs

johnnadratowski opened this issue · 5 comments

I get this error:

Invalid Input Error: Could not convert string '20186-****' to INT64 in column "Campaign_ID", at line 33118.

Parser options:
  file=./file.csv
  delimiter=',' (auto detected)
  quote='"' (auto detected)
  escape='"' (auto detected)
  header=1
  sample_size=20480
  ignore_errors=0
  all_varchar=0.

Consider either increasing the sample size (SAMPLE_SIZE=X [X rows] or SAMPLE_SIZE=-1 [all rows]), or skipping column conversion (ALL_VARCHAR=1)

It would be helpful to be able to pass these configurations to duckdb so it can load these files.

Thanks!

I could see people running into this same issue with ndjson files too (e.g. some records have an additional field but it causes an error because the sample size was too small). I suppose one option could be to default the sample size to -1 and allow users to override the sample size by passing an argument. I'll have to do some testing with larger files to see what makes the most sense.

Exposing the option as a command line argument or env var would work well. I wanted to use your tool on some big data dumps from a very large business. For this tool to be able to do that, it would definitely need more flexibility in directing duckdb on what to do.

I'm adding some functionality in 0.2.1 that I think should help with this. I'm making the default sample size -1 (so it samples all records). It makes the startup a little slower for very large files, but if you files are too big you might be better off just using on on-disk DuckDB anyways. Also, if you want more control over how files are loaded, you will be able to launch the TUI without specifying any files (e.g. just run filequery -e to get an empty in-memory database) and you can directly use DuckDB's functions to load files with whatever options you want. I think this is better than a config file because you can save the SQL you wrote in the TUI and pass it into CLI commands.

In this release, I'm making the table list on the left side of the screen reactive, which means the list will update as you create/drop tables in your database. Let me know your thoughts on this approach, thanks!

create_table

drafted a pull request for v0.2.1 with these changes #17

@johnnadratowski 0.2.1 is available now with this fix if you want to upgrade and try it out. I'm going to close this, but feel free to open another issue if you have any trouble!