Wrong values of % target
sebastien-foulle opened this issue · 4 comments
Hello,
the html report produced by the following script shows that if bill_length_mm <= 35 then % target < 90%,
and if 35 <= bill_length_mm <= 37.5 then % target > 105% (!).
import pandas as pd
from palmerpenguins import load_penguins
import sweetviz as sv
penguins = load_penguins()
penguins["target"] = penguins.species == 'Adelie'
penguins = penguins[["species", "bill_length_mm", "target"]]
penguins.head()
my_report = sv.analyze(penguins, target_feat = "target")
my_report.show_html()
But in fact if bill_length_mm <= 40, % target should always be 100% : there are only Adelie penguins in this case.
# Adelie 100
penguins.query('bill_length_mm <= 40').species.value_counts()
Maybe it's a rounding problem.
@sebastien-foulle thank you for reporting this, I will take a look!
I am experiencing a same event.
How is the progress of the investigation and fix here?
I have a similar issue! Attached is the example_data.pkl file, example_data.pkl.zip
The code to reproduce the result:
feature_config = sv.FeatureConfig(force_cat=['numerical_var'])
correct_report = sv.analyze([example_data, 'Train'],
target_feat='outcome',
feat_cfg=feature_config,
pairwise_analysis='off')
correct_report.show_html('correct_report.html')
feature_config = sv.FeatureConfig(force_num=['numerical_var'])
wrong_report = sv.analyze([example_data, 'Train'],
target_feat='outcome',
feat_cfg=feature_config,
pairwise_analysis='off')
wrong_report.show_html('wrong_report.html')
When we force_cat the numerical_var, we can get the correct distribution of the outcome:
If we force_num the numerical_var, the outcome distribution is completely off:
Fixed by 2ec0848!