tototoshi/scala-csv

Handling CSVs with only CR newlines

lbordowitz opened this issue · 1 comments

A user has uploaded a CSV which uses solely carriage return (CR, or \r) characters for their newlines. The current SourceLineReader.readLineWithTerminator handles this by successfully reading in the header row and then discarding the rest. We want it to display every row in the CSV.

We run something like this:

// get a blob from Google Cloud Platform storage
val spreadsheetSource = Source.fromInputStream(Channels.newInputStream(blob.reader()))
val reader = CsvReader.open(spreadsheetSource)
val lines = reader.all()
// lines: List[List[String]] = List(List("First Name", " Last Name", " email"))

This is despite the fact that the file we're reading from has five lines. I have also tried this with Source.fromFile, and there's no difference.

I created the file from a normal CSV with LF-style line endings, and then ran this bash command:

$ tr '\n' '\r' < fnln.csv >fnln.cr.csv

Side note: why can't we use Source's built-in getLines function? Is there a reason that we need the line terminator in each string?

The current implementation of scala-csv expects \n and \r\n as a newline code and does not support \r.
This was simply because I had never encountered a system that treated ' \r' as a newline code, and I thought it was enough. But I should probably support it.

Side note: why can't we use Source's built-in getLines function? Is there a reason that we need the line terminator in each string?

The difference between SourceLineReader.readLineWithTerminator and Source#getLines is whether it gets rid of newline codes or not.
To parse a csv field that contains multiline text, I need to preserve newline codes. Source#getLines doesn't fit this case.