larsyencken/csvdiff

invalid column name 'id' as key

Opened this issue · 4 comments

In the API :diff_files example ,it can work sucessful with column 'name' but failed with column 'id'

Traceback (most recent call last):
File "differ.py", line 3, in
patch = csvdiff.diff_files('Skill.csv', 'Skill_1.csv', ['id'])
File "/usr/local/lib/python3.6/dist-packages/csvdiff/init.py", line 44, in diff_files
ignore_columns=ignored_columns)
File "/usr/local/lib/python3.6/dist-packages/csvdiff/patch.py", line 204, in create
from_indexed = records.index(from_records, index_columns)
File "/usr/local/lib/python3.6/dist-packages/csvdiff/records.py", line 58, in index
raise InvalidKeyError('invalid column name {k} as key'.format(k=k))
csvdiff.records.InvalidKeyError: invalid column name 'id' as key

column 'id','name' are both in my testing files

skill.csv

id name desc
int string string
技能ID 技能名称 技能描述
1001 小恶魔普攻 attack01
1002 小恶魔普攻 attack01
1003 夏提雅技能 skill01

skill_1.csv

id name desc
int string string
技能ID 技能名称 技能描述
1001 小恶魔普攻 attack01
1002 小恶魔普攻 attack01
1003 夏提雅技能 skill01
1004 夏提雅奥义 skill02
1005 雅尔贝德普攻 attack01
_1 百合折 attack01
1006 雅尔贝德技能 attack02

if example data can‘t reproduce the bug,I can email the original files. @larsyencken

I cannot reproduce this:

 2019-05-22 05:46:02 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → head skills*
==> skills.csv <==
id,name,desc
int,string,string
技能ID,技能名称,技能描述
1001,小恶魔普攻,attack01
1002,小恶魔普攻,attack01
1003,夏提雅技能,skill01

==> skills1.csv <==
id,name,desc
int,string,string
技能ID,技能名称,技能描述
1001,小恶魔普攻,attack01
1002,小恶魔普攻,attack01
1003,夏提雅技能,skill01
1004,夏提雅奥义,skill02
1005,雅尔贝德普攻,attack01
_1,百合折,attack01
1006,雅尔贝德技能,attack02

 2019-05-22 05:46:22 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff
± |master ?:27 ✗| → csvdiff --style=summary id skills.csv skills1.csv
0 rows removed (0.0%)
4 rows added (80.0%)
0 rows changed (0.0%)

 2019-05-22 05:46:29 ⌚  |2.4.4| MacBook-Pro-3 in ~/projects/csvdiff

I had similar problem. In my case the problem was that I had been exporting data from excel, which caused that saved file had "UTF-8 BOM" encoding. This has been causing csvdiff to detect additional unicode characters in the name of the first column - instead of the "id" the csvdiff has treated this as "\u010f\u00bb\u017cid". The problem has been solved when I changed the encoding to normal "UTF-8".