ben-strasser/fast-cpp-csv-parser

Add ability to detect NULL values

Closed this issue · 3 comments

The lib now does not have ability to detect that some value in row is NULL.

col1, col2, col3
0,0,0
0,0,
0,,0
,0,0

When reading numeric rows

CSVReader<3, trim_chars<>, double_quote_escape<',', '\"'>> csv(file);
int col1, col2, col3;
csv.read_row(col1, col2, col3);

the columns with missing values are returned as 0, so we cannot determine wheather it is real 0 or NULL.

Possible proposal is to use some kind of "nullable" wrapping type.

CSVReader<3, trim_chars<>, double_quote_escape<',', '\"'>> csv(file);
Nullable<int> col1, col2, col3;
csv.read_row(col1, col2, col3);
if (col1.is_null()) { ... }

Thank you for your fast response.
Yes, I have been thinking about this solution, but it is kind of Do-It-Yourself.
It would be fine to have some incorporated solution for NULLs directly in lib.

There are two problems with NULLs:

  • There are many different ways to represent them in CSV files, nil, null, Null, invalid, n/a, ... You will not find an exhaustive list, so it will always be incomplete. Further, there are candidates such as nan where it is not clear whether this is a valid error state value or an invalid NULL value.
  • Many types do not have a null state. What do you put into an int if null is encountered? We could now do something like introduce an io::Nullable<int> or maybe use an std::optional<int>. The first gets complex very quickly and makes the interface way more complex. The second would probably work. I have not thought it through. The reason it is not there is because the library is older than std::optional.

All of these are problems that are circumvented in a do-it-yourself model. The simple cases that cover 99% get a pretty interface. Everything else needs to go via char*. The library is carefully written, that you will not loose speed by parsing the char* yourself. No copy is involved. It directly points into the internal storage.