jsoizo/kotlin-csv

Introduce BOM for Microsoft applications

theexiile1305 opened this issue · 10 comments

Hey there,

thank you very much for this gerat project.

Microsoft applications, for some reason, seem to require a BOM to parse for example UTF-8 files correctly, even though there is no byte order in UTF-8 like there is in 16/32. In order to open a created csv file correctly I suggest to add this special BOM (UTF-8 does require three special bytes 0xEF, 0xBB and 0xBF at the start of the file), even though the csvWriter is configured with the Charsets.UTF_8.name().

Why this is undocumented and why Excel seems to require a BOM for UTF-8 I don't know; might be good questions for Excel team at Microsoft.

What do you think or do you have any suggestion to solve this problem?

@theexiile1305
Thank you for the question. Can you elaborate on this?
Is your problem something like the following?
"CSV files written by kotlin-csv don't have a BOM, so it cannot be read by Excel."

@doyaaaaaken Thank you for your quick response. Yes of course, I can elaborate on this with the following example:
The csv file can be successfully created like with enabled UTF-8 setting

id,name,email
0,Jane,jane@example.com
1,Doe,doe@example.com
2,Müller,mueller@example.com

If I open this file Google Spreadsheet or Numbers (macOS spreadsheet application), then Müller is displayed correct. Inc contrast, Müller ist represented as M√ºller in Excel. In the further analysis it was noticed that all UTF-8 special characters (e.g. öäüÄÖÜß - the special german characters) are not displayed correctly in Excel.

@theexiile1305
The situation you described has been successfully reproduced by this code, thanks.

        csvWriter().open("test.csv") {
            writeRows(listOf(
                listOf("id","name","email"),
                listOf(0,"Jane","jane@example.com"),
                listOf(1,"Doe","doe@example.com"),
                listOf(2,"Müller","mueller@example.com"),
            ))
        }

So, I plan to introduce an includeBOM: Boolean option on CsvWriterContext.
You can use this option like the below snippet.
Do you think this is ok?

csvWriter{
    includeBOM = true
}.open("test.csv") {
  //do some operation
}

@doyaaaaaken
Sorry for the late response. The above snippet looks gerat and it's okay for me. Thank you!

@doyaaaaaken
If you want, I can give a try on that issue. 😄

@theexiile1305 Thanks! Please try it.

@theexiile1305: As a workaround, you can also import the csv file by Data | From Text/CSV instead of just opening it. This has the advantage that you can explicitly select the source file encoding in the import dialog:

grafik

hey @doyaaaaaken, has this been resolved?

Hi @EthanDunfordAspect , this has not been resolved yet.

released in v1.9.0 🚀