jsoizo/kotlin-csv

Feature request: Differentiate between empty string and null value

StefRe opened this issue · 5 comments

readAllWithHeader yields a List<Map<String,String>> and hence empty columns are being read as empty strings, so that we get "" for both col1 and col2 in the following example:

"col1","col2"
"",

I'd really like to get null for col2 here (this of course only makes sense if all strings are quoted, otherwise it wouldn't be clear how to interpret empty columns). I understand that you can't change the result to List<Map<String,String?>> now, but maybe you could add a nullCode option for reading as it already exists for writing. The default value is an empty string "" (=current behavior). I could then simply do

val nullCode = "NULL"
val rows = csvReader(nullCode=nullCode).readAllWithHeader(inputStream)
    .map { row -> row.mapValues { col -> if (col.value == nullCode) null else col.value } }

At first glance it seems that it only requires to change
https://github.com/doyaaaaaken/kotlin-csv/blob/c23a51b98fbbcb1348b3928d561f1f5e11fba965/src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/parser/ParseStateMachine.kt#L36-L48
to

                    delimiter -> {
                        field.append(nullCode)
                        flushField()
                        state = ParseState.DELIMITER
                    }
                    '\n', '\u2028', '\u2029', '\u0085' -> {
                        field.append(nullCode)
                        flushField()
                        state = ParseState.END
                    }
                    '\r' -> {
                        if (nextCh == '\n') pos += 1
                        field.append(nullCode)
                        flushField()
                        state = ParseState.END
                    }

and the same for https://github.com/doyaaaaaken/kotlin-csv/blob/c23a51b98fbbcb1348b3928d561f1f5e11fba965/src/commonMain/kotlin/com/github/doyaaaaaken/kotlincsv/parser/ParseStateMachine.kt#L87-L99

but I didn't check it thoroughly.

@StefRe
Thank you for your feedback!!
I'll think about this on weekend, thanks.

@StefRe
Thank you for your input.
In your case, you want to regard "" as empty string and `` as null value.

It's maybe possible to have a case like the one you encountered.
However, I looked at the RFC 4180 and considered carefully, then I've felt that your case is a special case in terms of the CSV specification.

I'm very sorry, but I'll wait to deal with it until I get same feedback from others.

Thanks for your reply.
RFC 4180 just describes the file format with empty fields being explicitly allowed:

field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA

However, it doesn't say anything about how to interpret the data, specifically it doesn't mention if an empty field should be an empty string or null.

For my use case (a mock ResultSet backed by csv data) it is essential to differentiate between an empty string "" and null.

OpenCSV ("com.opencsv:opencsv:5.4") provides fine-grained configuration of what the parser should consider null:

EMPTY_SEPARATORS - two sequential separators are null.
EMPTY_QUOTES - two sequential quotes are null
BOTH - both are null
NEITHER - default. Both are considered empty string.

and allows to do exactly what I need:

import com.opencsv.CSVReaderHeaderAware
import com.opencsv.CSVReaderHeaderAwareBuilder
import com.opencsv.enums.CSVReaderNullFieldIndicator

val csv = """
    "empty col","null col"
    "",
""".trimIndent()

fun main() {
    csv.byteInputStream().bufferedReader().use { br ->
        (CSVReaderHeaderAwareBuilder(br)
            .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
            .build() as CSVReaderHeaderAware)
            .use { cr ->
                val csvData = cr.readMap()
                assert(csvData["empty col"] == "")
                assert(csvData["null col"] == null)
            }
    }
}

So I can switch to OpenCSV but it's a bit heavy-weight as it does far more than I actually need so I'd prefer to stick with your lib for Kotlin projects.

@StefRe
Well, thank you for the good suggestions.
I'm convinced of your opinion.
Especially that configuration of OpenCSV seems to be also applicable for kotlin-csv.
I'd like to consider this topic as a feature request and try to implement it.

I really want this feature...