Introduce BOM for Microsoft applications
theexiile1305 opened this issue · 10 comments
Hey there,
thank you very much for this gerat project.
Microsoft applications, for some reason, seem to require a BOM to parse for example UTF-8 files correctly, even though there is no byte order in UTF-8 like there is in 16/32. In order to open a created csv file correctly I suggest to add this special BOM (UTF-8 does require three special bytes 0xEF
, 0xBB
and 0xBF
at the start of the file), even though the csvWriter
is configured with the Charsets.UTF_8.name()
.
Why this is undocumented and why Excel seems to require a BOM for UTF-8 I don't know; might be good questions for Excel team at Microsoft.
What do you think or do you have any suggestion to solve this problem?
@theexiile1305
Thank you for the question. Can you elaborate on this?
Is your problem something like the following?
"CSV files written by kotlin-csv don't have a BOM, so it cannot be read by Excel."
@doyaaaaaken Thank you for your quick response. Yes of course, I can elaborate on this with the following example:
The csv file can be successfully created like with enabled UTF-8 setting
id,name,email
0,Jane,jane@example.com
1,Doe,doe@example.com
2,Müller,mueller@example.com
If I open this file Google Spreadsheet or Numbers (macOS spreadsheet application), then Müller
is displayed correct. Inc contrast, Müller
ist represented as Müller
in Excel. In the further analysis it was noticed that all UTF-8 special characters (e.g. öäüÄÖÜß - the special german characters) are not displayed correctly in Excel.
@theexiile1305
The situation you described has been successfully reproduced by this code, thanks.
csvWriter().open("test.csv") {
writeRows(listOf(
listOf("id","name","email"),
listOf(0,"Jane","jane@example.com"),
listOf(1,"Doe","doe@example.com"),
listOf(2,"Müller","mueller@example.com"),
))
}
So, I plan to introduce an includeBOM: Boolean
option on CsvWriterContext
.
You can use this option like the below snippet.
Do you think this is ok?
csvWriter{
includeBOM = true
}.open("test.csv") {
//do some operation
}
@doyaaaaaken
Sorry for the late response. The above snippet looks gerat and it's okay for me. Thank you!
@doyaaaaaken
If you want, I can give a try on that issue. 😄
@theexiile1305 Thanks! Please try it.
@theexiile1305: As a workaround, you can also import the csv
file by Data | From Text/CSV instead of just opening it. This has the advantage that you can explicitly select the source file encoding in the import dialog:
hey @doyaaaaaken, has this been resolved?
Hi @EthanDunfordAspect , this has not been resolved yet.
released in v1.9.0 🚀