It appears Keyerror when I use the Dataprep module
DummyBroker opened this issue · 1 comments
Describe the bug
I use anaconda and install the dataprep module by the following code
conda install -c conda-forge dataprep
Then I try the example code from the website
from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing
df = load_dataset("titanic")
print(df.columns.tolist())
create_report(df).show()
and it showed the following error:
from dataprep.datasets import load_dataset
from dataprep.eda import create_report
from dataprep.eda import plot, plot_correlation, plot_missing
df = load_dataset("titanic")
print(df.columns.tolist())
['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']
create_report(df).show()
Computing series-max-agg-6f34ce939adc72d34b6b5a81d3b66957: 0%| | 0/1420 [00:00<?, ?it/s]C:\ProgramData\anaconda3\Lib\site-packages\dask\core.py:119: RuntimeWarning: invalid value encountered in divide
return func(*(_execute_task(a, cache) for a in args))
error happended in column:Survived
Traceback (most recent call last):
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3653 in get_loc
values are attempted to be sorted, but any TypeError from
File pandas\_libs\index.pyx:147 in pandas._libs.index.IndexEngine.get_loc
File pandas\_libs\index.pyx:176 in pandas._libs.index.IndexEngine.get_loc
File pandas\_libs\hashtable_class_helper.pxi:7080 in pandas._libs.hashtable.PyObjectHashTable.get_item
File pandas\_libs\hashtable_class_helper.pxi:7088 in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Survived'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
Cell In[123], line 1
create_report(df).show()
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\__init__.py:68 in create_report
"components": format_report(df, cfg, mode, progress),
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:78 in format_report
comps = format_basic(edaframe, cfg)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:291 in format_basic
res_variables = _format_variables(df, cfg, data)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\create_report\formatter.py:120 in _format_variables
rndrd = render(itmdt, cfg)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:2473 in render
visual_elem = render_cat(itmdt, cfg)
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:1573 in render_cat
fig = bar_viz(
File C:\ProgramData\anaconda3\Lib\site-packages\dataprep\eda\distribution\render.py:223 in bar_viz
df["pct"] = df[col] / nrows * 100
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\frame.py:3761 in __getitem__
key = com.apply_if_callable(key, self)
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3655 in get_loc
KeyError: 'Survived'
My numpy version is 1.25.2
My pandas version is 2.0.3
My Python version is 3.11.4
I want to know why this error happen and how to solve it.
Is there anything needed to be added?
Thank you so much!
I was facing this same issue. At least two of your dependencies are incompatible as on Dec 3 2023
Python is only supported from 3.8<=version<3.11 (https://pypi.org/project/dataprep/)
Pandas is supported <2. (Found while installing from pip)
Dataprep is running for me with following versions
python-3.10.10
pandas-1.5.3
numpy-1.26.2