kieferk/dfply

joining on different columns does not work

Closed this issue · 4 comments

I think joining on different columns does not work. By that I mean

a_df = pd.DataFrame.from_items([('one', [1,2,3]),('two',['a','b','c'])])
b_df = pd.DataFrame.from_items([('three', [1,2,3]),('four',['d','e','f'])])
a_df >> inner_join(b_df,by=['one','three'])

gives the error

  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'one'

and

a_df >> inner_join(b_df,by=[['one'],['three']])

gives

IndexError: list index out of range

This was indeed a bug. Should be fixed now, pull down the master branch and check it out, let me know if you have additional issues.

Thank you! Please push to Anaconda if possible.

I can confirm in 0.3.3, issue still same