Duplicated value after concatenation of tables with non-unique column names
trivelt opened this issue · 1 comments
trivelt commented
When two tables with unique column names are concatenated, then the result table is fine:
content = b"1,2,3,4\n5,6,7\n,8,9,10\n"
headers = ["foo", "header2", "foo2", "bar"]
t = petl.fromcsv(petl.MemorySource(content), encoding="utf-8", header=headers)
content2 = b"0,2,0,0\n1,1,0\n"
headers2 = ["foo", "header2", "foo2", "bar"]
t2 = petl.fromcsv(petl.MemorySource(content2), encoding="utf-8", header=headers2)
t3 = t.cat(t2)
>>> t
+-----+---------+------+------+
| foo | header2 | foo2 | bar |
+=====+=========+======+======+
| '1' | '2' | '3' | '4' |
+-----+---------+------+------+
| '5' | '6' | '7' | |
+-----+---------+------+------+
| '' | '8' | '9' | '10' |
+-----+---------+------+------+
>>>
>>>
>>> t2
+-----+---------+------+-----+
| foo | header2 | foo2 | bar |
+=====+=========+======+=====+
| '0' | '2' | '0' | '0' |
+-----+---------+------+-----+
| '1' | '1' | '0' | |
+-----+---------+------+-----+
>>>
>>>
>>> t3
+-----+---------+------+------+
| foo | header2 | foo2 | bar |
+=====+=========+======+======+
| '1' | '2' | '3' | '4' |
+-----+---------+------+------+
| '5' | '6' | '7' | None |
+-----+---------+------+------+
| '' | '8' | '9' | '10' |
+-----+---------+------+------+
| '0' | '2' | '0' | '0' |
+-----+---------+------+------+
| '1' | '1' | '0' | None |
+-----+---------+------+------+
>>>
However, when I'm merging two tables with non-unique column names, the values are duplicated in an unexpected way:
content = b"1,2,3,4\n5,6,7\n,8,9,10\n"
headers = ["foo", "header2", "foo", "bar"]
t = petl.fromcsv(petl.MemorySource(content), encoding="utf-8", header=headers)
content2 = b"0,2,0,0\n1,1,0\n"
headers2 = ["foo", "header2", "foo", "bar"]
t2 = petl.fromcsv(petl.MemorySource(content2), encoding="utf-8", header=headers2)
t3 = t.cat(t2)
>>> t
+-----+---------+-----+------+
| foo | header2 | foo | bar |
+=====+=========+=====+======+
| '1' | '2' | '3' | '4' |
+-----+---------+-----+------+
| '5' | '6' | '7' | |
+-----+---------+-----+------+
| '' | '8' | '9' | '10' |
+-----+---------+-----+------+
>>>
>>> t2
+-----+---------+-----+-----+
| foo | header2 | foo | bar |
+=====+=========+=====+=====+
| '0' | '2' | '0' | '0' |
+-----+---------+-----+-----+
| '1' | '1' | '0' | |
+-----+---------+-----+-----+
>>>
>>> t3
+-----+---------+-----+------+
| foo | header2 | foo | bar |
+=====+=========+=====+======+
| '1' | '2' | '1' | '4' |
+-----+---------+-----+------+
| '5' | '6' | '5' | None |
+-----+---------+-----+------+
| '' | '8' | '' | '10' |
+-----+---------+-----+------+
| '0' | '2' | '0' | '0' |
+-----+---------+-----+------+
| '1' | '1' | '1' | None |
+-----+---------+-----+------+
It it intended behavior?
Version and installation information
- petl version: 1.6.8
- python version: 3.8.0
- OS: Linux
- petl installed via pip
arturponinski commented
This seems to be the desired behavior, as a similar test case already exists in the project since 2015 and have been marked @alimanfoo as "pathological":
https://github.com/petl-developers/petl/blob/master/petl/test/transform/test_basics.py#L206
Change of this behavior could be implemented as a BC in 2.0.
WDYT @juarezr