How to solve RuntimeWarning: invalid value encountered in double_scalars and RuntimeWarning: Mean of empty slice.
LucasFKobernic opened this issue · 3 comments
Hi,
I'm using this package to perform some two-way ANOVA on some data. I wrote some tests to test the function first and I'm facing some issues using pg.rm_anova.
Steps:
- Function to generate synthetic data that is normally distributed
def generate_data(participant_ids, speeds, ttas, mean, std_dev): data = [] for pid in participant_ids: for speed in speeds: for tta in ttas: gap_acceptance = np.random.normal(loc=mean, scale=std_dev) data.append({'Participant_ID': pid, 'speed': speed, 'tta': tta, 'Gap_Acceptance': gap_acceptance}) return pd.DataFrame(data)
2. Generate synthetic data:
experimental_data = generate_data([i for i in range(200)],[30, 40, 50],[3, 4, 5],mean = 6, std_dev = 1.0)
- Set "tta" and "speed" as categorical variable:
experimental_data["speed"] = experimental_data["speed"].astype('category') experimental_data["tta"] = experimental_data["tta"].astype('category')
4. Running two-way anova:
anova = pg.rm_anova(dv = "Gap_Acceptance", within = ["speed", "tta"], subject = "Participant_ID", data = experimental_data, detailed = True)
Errors/Warnings:
I'm getting the follwing Warnings:
C:\mypath\AppData\Roaming\Python\Python38\site-packages\numpy\core\fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, ret = ret.dtype.type(ret / rcount) .C:\mypath\AppData\Roaming\Python\Python38\site-packages\numpy\core\fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, C:\mypath\AppData\Roaming\Python\Python38\site-packages\numpy\core\_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)
I already checked and I don't have any NaN values in the data. Has someone any Idea what is causing this warning and how it may influence the results of the rm_anova()?
Thanks :)
Could you check your code and run this again? It is hard to tell how you're code runs because the code is provided without proper indents. If I run the exact same code (provided indents) it works fine for me:
import numpy as np
import pandas as pd
import pingouin as pg
def generate_data(participant_ids, speeds, ttas, mean, std_dev):
data = []
for pid in participant_ids:
for speed in speeds:
for tta in ttas:
gap_acceptance = np.random.normal(loc=mean, scale=std_dev)
data.append({'Participant_ID': pid, 'speed': speed, 'tta': tta, 'Gap_Acceptance': gap_acceptance})
return pd.DataFrame(data)
experimental_data = generate_data([i for i in range(200)], [30, 40, 50], [3, 4, 5], mean=6, std_dev=1.0)
experimental_data["speed"] = experimental_data["speed"].astype('category')
experimental_data["tta"] = experimental_data["tta"].astype('category')
anova = pg.rm_anova(dv="Gap_Acceptance", within=["speed", "tta"], subject="Participant_ID", data=experimental_data, detailed=True)
I get a single UserWarning about Epsilon values, but that is to be expected because of the levels in your data. I don't get any of the Runtime warnings/errors like you're getting above. The output for me is:
Source SS ddof1 ddof2 MS F p-unc p-GG-corr ng2 eps
0 speed 0.282911 2 398 0.141456 0.151597 0.859385 0.858779 0.000155 0.996976
1 tta 0.769135 2 398 0.384567 0.387275 0.679160 0.679027 0.000422 0.999362
2 speed * tta 3.209345 4 796 0.802336 0.775536 0.541232 0.535358 0.001758 0.950741
@LucasFKobernic can you please also let us know which version of numpy and pandas you're running?
Closing this issue but feel free to reopen