jdunck/python-unicodecsv

version 0.11.1 breaks when writing Unicode CSV headers

Closed this issue · 3 comments

Hey, I think a recent change caused a bug. Here's a demo program: https://gist.github.com/NelsonMinar/aacf7d6dfe4e40b36c16

Long story short, if the CSV header contains Unicode strings it now throws an error.

Unicode CSV version 0.11.1
/usr/lib/python2.7/csv.py:145: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  wrong_fields = [k for k in rowdict if k not in self.fieldnames]
Traceback (most recent call last):
  File "testUCSV.py", line 13, in <module>
    writer.writeheader()
  File "/usr/local/lib/python2.7/dist-packages/unicodecsv/__init__.py", line 159, in writeheader
    self.writerow(header)
  File "/usr/lib/python2.7/csv.py", line 152, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "/usr/lib/python2.7/csv.py", line 148, in _dict_to_list
    + ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'unicode\xe2\x98\x83'

I wouldn't be surprised if you don't have any tests for non-ASCII headers, it's unusual. But I have some live Japanese government street address data like this. I got a bit confused looking at the git history but I think this is related to calling _stringify_list in writeheader. Version 0.11.0 didn't do that.

@NelsonMinar Thanks for the report and repo example. I bisected the tree and found d27d182 is the change that introduced the problem. That was meant as a tweak/optimization, but that unintentionally caused failure. And we didn't have a covering test. I thought the change was small enough I merged it anyway. Lesson learned. :)

I added a test to cover this (test_write_unicode_header_dict) and will release a 0.11.2 momentarily.

OK, this is pushed up - please test with your real data and reopen if not fixed.

Thanks, this looks fixed. My real data works. Nice test!