fbdesignpro/sweetviz

Wrong values of % target

sebastien-foulle opened this issue · 4 comments

Hello,

the html report produced by the following script shows that if bill_length_mm <= 35 then % target < 90%,
and if 35 <= bill_length_mm <= 37.5 then % target > 105% (!).

image

import pandas as pd
from palmerpenguins import load_penguins
import sweetviz as sv
penguins = load_penguins()
penguins["target"] = penguins.species == 'Adelie'
penguins = penguins[["species", "bill_length_mm", "target"]]
penguins.head()

my_report = sv.analyze(penguins, target_feat = "target")
my_report.show_html()

But in fact if bill_length_mm <= 40, % target should always be 100% : there are only Adelie penguins in this case.

# Adelie    100
penguins.query('bill_length_mm <= 40').species.value_counts()

Maybe it's a rounding problem.

@sebastien-foulle thank you for reporting this, I will take a look!

I am experiencing a same event.
How is the progress of the investigation and fix here?

I have a similar issue! Attached is the example_data.pkl file, example_data.pkl.zip

The code to reproduce the result:

feature_config = sv.FeatureConfig(force_cat=['numerical_var'])
correct_report = sv.analyze([example_data, 'Train'],
                             target_feat='outcome', 
                             feat_cfg=feature_config,
                             pairwise_analysis='off')
correct_report.show_html('correct_report.html')

feature_config = sv.FeatureConfig(force_num=['numerical_var'])
wrong_report = sv.analyze([example_data, 'Train'],
                           target_feat='outcome', 
                           feat_cfg=feature_config,
                           pairwise_analysis='off')
wrong_report.show_html('wrong_report.html')

When we force_cat the numerical_var, we can get the correct distribution of the outcome:

correct_need_to_force_cat

If we force_num the numerical_var, the outcome distribution is completely off:

wrong_as_numerical

Fixed by 2ec0848!