sfu-db/dataprep

It appears Keyerror when I use the Dataprep module

DummyBroker opened this issue · 1 comments

Describe the bug
I use anaconda and install the dataprep module by the following code
conda install -c conda-forge dataprep
Then I try the example code from the website

from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing
df = load_dataset("titanic")
print(df.columns.tolist())
create_report(df).show()

and it showed the following error:

from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing

df = load_dataset("titanic")
print(df.columns.tolist())
['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']

create_report(df).show()
Computing series-max-agg-6f34ce939adc72d34b6b5a81d3b66957:   0%|          | 0/1420 [00:00<?, ?it/s]C:\ProgramData\anaconda3\Lib\site-packages\dask\core.py:119: RuntimeWarning: invalid value encountered in divide
  return func(*(_execute_task(a, cache) for a in args))
error happended in column:Survived                                                                              
Traceback (most recent call last):

  File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3653 in get_loc
    values are attempted to be sorted, but any TypeError from

  File pandas\_libs\index.pyx:147 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\index.pyx:176 in pandas._libs.index.IndexEngine.get_loc

  File pandas\_libs\hashtable_class_helper.pxi:7080 in pandas._libs.hashtable.PyObjectHashTable.get_item

  File pandas\_libs\hashtable_class_helper.pxi:7088 in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Survived'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  Cell In[123], line 1
    create_report(df).show()

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\__init__.py:68 in create_report
    "components": format_report(df, cfg, mode, progress),

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:78 in format_report
    comps = format_basic(edaframe, cfg)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:291 in format_basic
    res_variables = _format_variables(df, cfg, data)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:120 in _format_variables
    rndrd = render(itmdt, cfg)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:2473 in render
    visual_elem = render_cat(itmdt, cfg)

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:1573 in render_cat
    fig = bar_viz(

  File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:223 in bar_viz
    df["pct"] = df[col] / nrows * 100

  File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\frame.py:3761 in __getitem__
    key = com.apply_if_callable(key, self)

  File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3655 in get_loc

KeyError: 'Survived'

My numpy version is 1.25.2
My pandas version is 2.0.3
My Python version is 3.11.4

I want to know why this error happen and how to solve it.
Is there anything needed to be added?
Thank you so much!

I was facing this same issue. At least two of your dependencies are incompatible as on Dec 3 2023
Python is only supported from 3.8<=version<3.11 (https://pypi.org/project/dataprep/)
Pandas is supported <2. (Found while installing from pip)
Dataprep is running for me with following versions

python-3.10.10
pandas-1.5.3
numpy-1.26.2