[BUG] `LuxSeries.unique` returns incorrect values after subsetting
lukauskas opened this issue · 1 comments
Describe the bug
LuxSeries
wrapper around pandas Series, does not compute the unique values correctly for series corresponding to subsets of dataframe.
To Reproduce
Invent some data:
data = pd.DataFrame([['a', 1, 2], ['b', 2, 3], ['c', -1, 17]], columns=['foo', 'bar', 'baz'])
View it, no need to click on the lux button or anything.
data
Now create a subset of this data from bar > 0
and select foo
column only.
data = data[data['bar'] > 0]['foo']
data
is now a Series with two values in it 'a'
and 'b'
.
In the notebook I view it again and it produces the correct output, no need to click anything:
data
However running the .unique()
function on the series:
data.unique()
Returns all values including ['a', 'b', 'c']
, when it should forget about the value 'c'
due to subsetting.
See gist and screenshot below
Expected behavior
data.unique()
should return only ['a', 'b']
.
Screenshots
Debugging information
Package Versions
----------------
Version
python 3.10.2
lux 0.5.1
pandas 1.4.0
luxwidget 0.1.11
matplotlib 3.5.1
altair 4.2.0
IPython 8.0.1
ipykernel 6.9.0
ipywidgets 7.6.5
jupyter_client 7.1.2
jupyter_core 4.9.1
jupyter_server 1.13.5
jupyterlab 3.3.0
nbclient 0.5.10
nbconvert 6.4.1
nbformat 5.1.3
notebook 6.4.8
qtconsole 5.2.2
traitlets 5.1.1
Widget Setup
-------------
✅ Jupyter Lab Running
✅ luxwidget is enabled
Additional context
This actually can cause some very nasty and silent errors in the analyses that depend on this .unique()
operator, as only the import of lux
is needed to redefine behaviour.
Thanks for your detailed issue report @lukauskas! We will be looking into this issue soon!