sfu-db/dataprep

Using percentages instead of counts to compare distribution of two tables

borisRa opened this issue · 2 comments

Hi,

How can I compare between train/test distributions ?
Using this code :
plot_diff([train_df[train_df.columns[~train_df.columns.isin(['Survived'])]], test_df],config={"diff.label": ["train_df", "test_df"]})

I am getting counts as is , I would like to compare percentage instead.
Similar to this plot for Age distribution :
image

Thanks !
Boris

Hi @borisRa , thanks for proposing the issue. Will diff.density=True works for you? (related: #698)

Hi @borisRa , thanks for proposing the issue. Will diff.density=True works for you? (related: #698)

nope . should be similar to the plot above to be able to compare distributions and not counts