jsoizo/kotlin-csv

Improvement: Read one row at a time

Closed this issue · 8 comments

Currently the only way to interact with a CSV is to parse all rows. Two use-cases that this does not cover are:

  • Reading only the header. This is useful if you wish to provide a breakdown of what is included in the file. While it should be trivial to do without a library, the existence of this library and its parsing logic supports the position that this is a non-trivial task.
  • Reading row-by-row, which is arguably a superset of the former use-case. This would be helpful when interacting with asynchronous workflows. One could attempt to read a single row from a piped input stream, and the library throws an exception when another line cannot be read in its entirety (as it does now with the full text). The producer can then continue to populate the input stream as data becomes available. The end result would be an asynchronous stream of rows (which I am not suggesting should be included in this library, but these changes would make this possible).

Hi @jnfeinstein, thank for the question.
With the readNext() method, you can read one row at a time.
https://github.com/doyaaaaaken/kotlin-csv#read-line-by-line

I think this feature is enough to deal with your proposed use-cases, isn't it?

@doyaaaaaken thanks for the timely reply. I'll attempt to use that and report back.

This line prevents the reader from being used in an asynchronous context as it is closing the stream on return. I believe the intended usage would be similar to:

val reader = csvReader().open(inputStream)
val nextRow = reader.readNext()

This allows the caller to take advantage of the state machine within the CSV reader while giving them the freedom to play with the input stream as desired.

In regards to headers, having fun readHeader(): List<String> and fun readNextWithHeader(): Map<String, String> would be helpful.

As I wrote on the principle section on README, I'd like to hide file close process.
So, if you could, I'd like you to use open and readAllAsSequence methods.

But, if there is the use-case to open and close files manually, we need to implement that feature.
On CSV writer, that use-case exists. So I provide openAndGetRawWriter method to control file close process manually. The background is here. #59

What do you think whether the same structure is needed on CSV reader?

My use case is similar to #59 except from the read perspective.

on Java, we always need to close file. but it's boilerplate code and not friendly for non-JVM user.

In my opinion this makes sense for the methods that accept File or fileName: String as an input. I would expect that the library would handle opening and closing the file because anything else would be an unintended side-effect. However, an input of type InputStream means that the user has manually opened the stream. The parallel expectation would be that the user is also expected to close the stream. It actually took some time debugging to determine that kotlin-csv was doing this. The syntax would look like:

csvContent.toByteArrayOutputStream().use {
  csvReader().open(it) {
    doForEachRow(readNext())
  }
  
  doComplexOperationWithInputStream(it)
}

profit()

I've taken advantage of the same auto-close functionality used internally in kotlin-csv to reduce boilerplate, but now have the capability to perform a complex operation on this stream that may be unique to my use case.

Actually, I partly agree with your below point. Thanks for your insightful opinion.

In my opinion this makes sense for the methods that accept File or fileName: String as an input. I would expect that the library would handle opening and closing the file because anything else would be an unintended side-effect. However, an input of type InputStream means that the user has manually opened the stream. The parallel expectation would be that the user is also expected to close the stream.

But, your providing example seems not to be possible because we can read InputStream at most once.
In your example, you read InputStream twice.

On your use-case, you could write codes like a following snippet.

val ips: InputStream = "a1,b1\r\na2,b2".toByteArray().inputStream()
csvReader().open(ips) {
    var row: List<String>? = readNext()
    while(row != null) {
        doComplexOperationWithInputStream(row)
        row = readNext()
    }
}

InputStream is an abstract class, allowing the caller to create an implementation that behaves however they wish. Some of the standard implementations may even be reset. The overall point is that closing prematurely may be an undesirable side-effect to some calling applications. 😁

I can understand your opinion, but in my sense, to close file like below snippet are not unnatural.
The reason is written as code comment on below snippet.

val ips1: InputStream = "a1,b1\r\na2,b2".toByteArray().inputStream()
val csv1 = csvReader().readAll(ips1) // to close the InputStream, BECAUSE it was all read.

val ips2: InputStream = "a1,b1\r\na2,b2".toByteArray().inputStream()
val csv2 = csvReader().open(ips2) {
  readAllAsSequence().toList()
}  // to close the InputStream, BECAUSE it is enclosed by `open` method.