output of normality is text not bool when testing fewer than 4 samples
dalensis opened this issue · 3 comments
Hi, thank you for the exceptional library you developed.
The output of the normality function is not a bool when using less than 4 replicates per sample. the test return a warning and a NAN in the pvalue. It still write false in the the normal column, but now it's a string value and not a bool.
Bests
Hi @dalensis,
Thanks for opening the issue! Can you please provide minimal code to reproduce the issue? Thanks!
Here it is:
from pingouin import normality
import seaborn as sns
data = sns.load_dataset("penguins")[:3]
print (data)
normal = normality(data, dv=data.columns[2], group=data.columns[1]) #test normality
print(normal)
if normal["normal"].all():
print("ALL TRUE test bool")
else:
print("at least 1 false test bool")
if all(map(lambda ele: str(ele).lower().capitalize() == "True",normal["normal"])): #sometimes the result of normal is a str!
print("ALL TRUE test string")
else:
print("At least 1 false test string")
OUTPUT:
species island bill_length_mm ... flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 ... 181.0 3750.0 Male
1 Adelie Torgersen 39.5 ... 186.0 3800.0 Female
2 Adelie Torgersen 40.3 ... 195.0 3250.0 Female
[3 rows x 7 columns]
W pval normal island
Torgersen NaN NaN False
ALL TRUE test bool
At least 1 false test string
Python311\Lib\site-packages\pingouin\distribution.py:242: UserWarning: Group Torgersen has less than 4 valid samples. Returning NaN.
warnings.warn(f"Group {idx} has less than 4 valid samples. Returning NaN.")
Comment:
In bold the results of the two tests. When the output is a list of string the test for all TRUE boolean values gives True, because the values are not 0.
Thank you. You are right that Pingouin incorrectly returns False as a string:
pingouin/pingouin/distribution.py
Line 244 in 7923141
Actually, I think it might make more sense to return either np.nan or a nullable boolean.
If you'd like, please feel free to submit a PR to fix this behavior. Thanks