lux-org/lux

[BUG] `LuxSeries.unique` returns incorrect values after subsetting

lukauskas opened this issue · 1 comments

Describe the bug

LuxSeries wrapper around pandas Series, does not compute the unique values correctly for series corresponding to subsets of dataframe.

To Reproduce

Invent some data:

data = pd.DataFrame([['a', 1, 2], ['b', 2, 3], ['c', -1, 17]], columns=['foo', 'bar', 'baz'])

View it, no need to click on the lux button or anything.

data

Now create a subset of this data from bar > 0 and select foo column only.

data = data[data['bar'] > 0]['foo']

data is now a Series with two values in it 'a' and 'b'.
In the notebook I view it again and it produces the correct output, no need to click anything:

data

However running the .unique() function on the series:

data.unique()

Returns all values including ['a', 'b', 'c'], when it should forget about the value 'c' due to subsetting.

See gist and screenshot below

Expected behavior

data.unique() should return only ['a', 'b'].

Screenshots

image

Debugging information

Package Versions
----------------
               Version
        python 3.10.2 
           lux  0.5.1 
        pandas  1.4.0 
     luxwidget 0.1.11 
    matplotlib  3.5.1 
        altair  4.2.0 
       IPython  8.0.1 
     ipykernel  6.9.0 
    ipywidgets  7.6.5 
jupyter_client  7.1.2 
  jupyter_core  4.9.1 
jupyter_server 1.13.5 
    jupyterlab  3.3.0 
      nbclient 0.5.10 
     nbconvert  6.4.1 
      nbformat  5.1.3 
      notebook  6.4.8 
     qtconsole  5.2.2 
     traitlets  5.1.1 

Widget Setup
-------------
✅ Jupyter Lab Running
✅ luxwidget is enabled

Additional context

This actually can cause some very nasty and silent errors in the analyses that depend on this .unique() operator, as only the import of lux is needed to redefine behaviour.

Thanks for your detailed issue report @lukauskas! We will be looking into this issue soon!