petl-developers/petl

Duplicated value after concatenation of tables with non-unique column names

trivelt opened this issue · 1 comments

When two tables with unique column names are concatenated, then the result table is fine:

content = b"1,2,3,4\n5,6,7\n,8,9,10\n"
headers = ["foo", "header2", "foo2", "bar"]
t = petl.fromcsv(petl.MemorySource(content), encoding="utf-8", header=headers)

content2 = b"0,2,0,0\n1,1,0\n"
headers2 = ["foo", "header2", "foo2", "bar"]
t2 = petl.fromcsv(petl.MemorySource(content2), encoding="utf-8", header=headers2)

t3 = t.cat(t2)

>>> t
+-----+---------+------+------+
| foo | header2 | foo2 | bar  |
+=====+=========+======+======+
| '1' | '2'     | '3'  | '4'  |
+-----+---------+------+------+
| '5' | '6'     | '7'  |      |
+-----+---------+------+------+
| ''  | '8'     | '9'  | '10' |
+-----+---------+------+------+

>>> 
>>> 
>>> t2
+-----+---------+------+-----+
| foo | header2 | foo2 | bar |
+=====+=========+======+=====+
| '0' | '2'     | '0'  | '0' |
+-----+---------+------+-----+
| '1' | '1'     | '0'  |     |
+-----+---------+------+-----+

>>> 
>>> 
>>> t3
+-----+---------+------+------+
| foo | header2 | foo2 | bar  |
+=====+=========+======+======+
| '1' | '2'     | '3'  | '4'  |
+-----+---------+------+------+
| '5' | '6'     | '7'  | None |
+-----+---------+------+------+
| ''  | '8'     | '9'  | '10' |
+-----+---------+------+------+
| '0' | '2'     | '0'  | '0'  |
+-----+---------+------+------+
| '1' | '1'     | '0'  | None |
+-----+---------+------+------+

>>> 

However, when I'm merging two tables with non-unique column names, the values are duplicated in an unexpected way:

content = b"1,2,3,4\n5,6,7\n,8,9,10\n"
headers = ["foo", "header2", "foo", "bar"]
t = petl.fromcsv(petl.MemorySource(content), encoding="utf-8", header=headers)

content2 = b"0,2,0,0\n1,1,0\n"
headers2 = ["foo", "header2", "foo", "bar"]
t2 = petl.fromcsv(petl.MemorySource(content2), encoding="utf-8", header=headers2)

t3 = t.cat(t2)

>>> t
+-----+---------+-----+------+
| foo | header2 | foo | bar  |
+=====+=========+=====+======+
| '1' | '2'     | '3' | '4'  |
+-----+---------+-----+------+
| '5' | '6'     | '7' |      |
+-----+---------+-----+------+
| ''  | '8'     | '9' | '10' |
+-----+---------+-----+------+

>>> 
>>> t2
+-----+---------+-----+-----+
| foo | header2 | foo | bar |
+=====+=========+=====+=====+
| '0' | '2'     | '0' | '0' |
+-----+---------+-----+-----+
| '1' | '1'     | '0' |     |
+-----+---------+-----+-----+

>>> 
>>> t3
+-----+---------+-----+------+
| foo | header2 | foo | bar  |
+=====+=========+=====+======+
| '1' | '2'     | '1' | '4'  |
+-----+---------+-----+------+
| '5' | '6'     | '5' | None |
+-----+---------+-----+------+
| ''  | '8'     | ''  | '10' |
+-----+---------+-----+------+
| '0' | '2'     | '0' | '0'  |
+-----+---------+-----+------+
| '1' | '1'     | '1' | None |
+-----+---------+-----+------+

It it intended behavior?

Version and installation information

  • petl version: 1.6.8
  • python version: 3.8.0
  • OS: Linux
  • petl installed via pip

This seems to be the desired behavior, as a similar test case already exists in the project since 2015 and have been marked @alimanfoo as "pathological":
https://github.com/petl-developers/petl/blob/master/petl/test/transform/test_basics.py#L206
Change of this behavior could be implemented as a BC in 2.0.
WDYT @juarezr