PSLmodels/microdf

MicroSeries.gini fails with KeyError if indexes are duplicated

Opened this issue · 1 comments

I filed this prematurely, still need to form a MWE but basically I have a MicroDataFrame with duplicated indexes, and calling df.x.gini() causes a KeyError. df.groupby(g).x.gini() works (if indexes aren't duplicated within each g) and mdf.gini(df, "x", "w") also works.

These both produce this error, suggesting we need to pass the index to the weights throughout generic.py.

d = mdf.MicroDataFrame({"x": [1, 2, 3]}, index=[1, 1, 2], weights=[4, 5, 6])
d = mdf.MicroDataFrame({"x": [1, 2, 3]}, index=[1, 1, 2], weights=pd.Series([4, 5, 6], index=[1, 1, 2]))
``
>KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([0], dtype='int64'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"