version 0.11.1 breaks when writing Unicode CSV headers
Closed this issue · 3 comments
Hey, I think a recent change caused a bug. Here's a demo program: https://gist.github.com/NelsonMinar/aacf7d6dfe4e40b36c16
Long story short, if the CSV header contains Unicode strings it now throws an error.
Unicode CSV version 0.11.1
/usr/lib/python2.7/csv.py:145: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
Traceback (most recent call last):
File "testUCSV.py", line 13, in <module>
writer.writeheader()
File "/usr/local/lib/python2.7/dist-packages/unicodecsv/__init__.py", line 159, in writeheader
self.writerow(header)
File "/usr/lib/python2.7/csv.py", line 152, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/usr/lib/python2.7/csv.py", line 148, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'unicode\xe2\x98\x83'
I wouldn't be surprised if you don't have any tests for non-ASCII headers, it's unusual. But I have some live Japanese government street address data like this. I got a bit confused looking at the git history but I think this is related to calling _stringify_list
in writeheader
. Version 0.11.0 didn't do that.
@NelsonMinar Thanks for the report and repo example. I bisected the tree and found d27d182 is the change that introduced the problem. That was meant as a tweak/optimization, but that unintentionally caused failure. And we didn't have a covering test. I thought the change was small enough I merged it anyway. Lesson learned. :)
I added a test to cover this (test_write_unicode_header_dict) and will release a 0.11.2 momentarily.
OK, this is pushed up - please test with your real data and reopen if not fixed.
Thanks, this looks fixed. My real data works. Nice test!