Unicode characters cause UnicodeEncodeError from clevercsv.wrappers.write_table on Windows 10
Closed this issue · 3 comments
Hello and thank you for your work on this excellent library! I'm running on a Windows 10 machine and encountering a UnicodeEncodeError when attempting to write data that includes Unicode using clevercsv.wrappers.write_table
.
It appears that adding an optional encoding
argument to clevercsv.wrappers.write_table
would fix this, as it works when I use the clevercsv.writer
without the wrapper as a workaround (below).
Workaround:
with open("outfile.csv", "w", newline="", encoding="utf-8") as fp:
w = clevercsv.writer(fp)
w.writerows(data_list)
Stack Trace:
Traceback (most recent call last):
File "<REDACTED>", line 143, in <module>
report.create_csv_report()
File "<REDACTED>", line 42, in create_csv_report
File "<REDACTED>\lib\site-packages\clevercsv\wrappers.py", line 441, in write_table
w.writerows(table)
File "<REDACTED>\lib\site-packages\clevercsv\write.py", line 60, in writerows
return self._writer.writerows(rows)
File "<REDACTED>\local\programs\python\python37-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2033' in position 250: character maps to <undefined>
Hi @mitchgrogg, thank you for reporting this issue!
I'm trying to figure out whether this is a CleverCSV bug or perhaps simply unexpected behavior. Would you mind trying it with the Python CSV module to see if you get the same result?
When you use open
without a specific encoding, Python uses whatever locale.getpreferredencoding()
returns (link to docs). Judging from your stack trace, it might simply be that on your Windows machine this is cp1252
. If that's the case then you indeed need to specify utf-8
explicitly when writing unicode data.
Let me know what you find, if it does turn out to be a bug in CleverCSV or something we could document better or turn into a feature, I'd like to hear it!
It is indeed, caused by the fact that Windows still uses the legacy cp1252 encoding, unfortunately. If I set the PYTHONUTF8=1
environment variable on my system, it works. However, that workaround only works on Python 3.7+.
I suggest adding the optional named encoding
argument to clevercsv.wrappers.write_table
. It seems counterintuitive that one can read_table
with a specific encoding, but then not write_table
that same data with a specific encoding (example below).
table_list = clevercsv.wrappers.read_table('example_in.csv', encoding='utf-8')
clevercsv.wrappers.write_table(table_list, 'example_out.csv') # This throws UnicodeEncodeError
I'd be happy to open a pull request with my proposed changes if you're open to that.
Thanks for checking the encoding issue and for the suggestion to add the encoding keyword to write_table
, that was definitely a bug. Normally I'd be happy for you to create a PR but since it was such a small fix I've added it myself (#28). I'll prepare an updated release of CleverCSV right away. Please let me know if you have any other suggestions or run into other problems!