larsyencken/csvdiff

diff csv files encode with utf8

Closed this issue · 1 comments

Traceback (most recent call last):
File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 64, in
compare_common(db_list1, db_list2)
File "C:/Users/firsi/PycharmProjects/sql_compare/operation.py", line 54, in compare_common
diff = csvdiff.diff_files(file1, file2, [(index.split(',')[0])])
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff_init_.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\patch.py", line 204, in create
from_indexed = records.index(from_records, index_columns)
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 53, in index
for r in record_seq
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 51, in
obj = {
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\site-packages\csvdiff\records.py", line 38, in iter
for lineno, r in enumerate(self.reader, 2):
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 111, in next
self.fieldnames
File "C:\Users\firsi\AppData\Local\Programs\Python\Python37\lib\csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 85: illegal multibyte sequence

when I diff two files I write with utf8, pycharm raise this error

Interesting, it looks like maybe your OS is setting a different default encoding than UTF-8. You might try loading the records yourself and then diffing the records instead of the files.