raphaelvallat/pingouin

How to solve RuntimeWarning: invalid value encountered in double_scalars and RuntimeWarning: Mean of empty slice.

LucasFKobernic opened this issue · 3 comments

Hi,

I'm using this package to perform some two-way ANOVA on some data. I wrote some tests to test the function first and I'm facing some issues using pg.rm_anova.

Steps:

  1. Function to generate synthetic data that is normally distributed

def generate_data(participant_ids, speeds, ttas, mean, std_dev): data = [] for pid in participant_ids: for speed in speeds: for tta in ttas: gap_acceptance = np.random.normal(loc=mean, scale=std_dev) data.append({'Participant_ID': pid, 'speed': speed, 'tta': tta, 'Gap_Acceptance': gap_acceptance}) return pd.DataFrame(data)
2. Generate synthetic data:

experimental_data = generate_data([i for i in range(200)],[30, 40, 50],[3, 4, 5],mean = 6, std_dev = 1.0)

  1. Set "tta" and "speed" as categorical variable:

experimental_data["speed"] = experimental_data["speed"].astype('category') experimental_data["tta"] = experimental_data["tta"].astype('category')
4. Running two-way anova:
anova = pg.rm_anova(dv = "Gap_Acceptance", within = ["speed", "tta"], subject = "Participant_ID", data = experimental_data, detailed = True)

Errors/Warnings:
I'm getting the follwing Warnings:

C:\mypath\AppData\Roaming\Python\Python38\site-packages\numpy\core\fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, ret = ret.dtype.type(ret / rcount) .C:\mypath\AppData\Roaming\Python\Python38\site-packages\numpy\core\fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, C:\mypath\AppData\Roaming\Python\Python38\site-packages\numpy\core\_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

I already checked and I don't have any NaN values in the data. Has someone any Idea what is causing this warning and how it may influence the results of the rm_anova()?

Thanks :)

Could you check your code and run this again? It is hard to tell how you're code runs because the code is provided without proper indents. If I run the exact same code (provided indents) it works fine for me:

import numpy as np
import pandas as pd
import pingouin as pg


def generate_data(participant_ids, speeds, ttas, mean, std_dev):
    data = []
    for pid in participant_ids:
        for speed in speeds:
            for tta in ttas:
                gap_acceptance = np.random.normal(loc=mean, scale=std_dev)
                data.append({'Participant_ID': pid, 'speed': speed, 'tta': tta, 'Gap_Acceptance': gap_acceptance})
    return pd.DataFrame(data)

experimental_data = generate_data([i for i in range(200)], [30, 40, 50], [3, 4, 5], mean=6, std_dev=1.0)

experimental_data["speed"] = experimental_data["speed"].astype('category')
experimental_data["tta"] = experimental_data["tta"].astype('category')

anova = pg.rm_anova(dv="Gap_Acceptance", within=["speed", "tta"], subject="Participant_ID", data=experimental_data, detailed=True)

I get a single UserWarning about Epsilon values, but that is to be expected because of the levels in your data. I don't get any of the Runtime warnings/errors like you're getting above. The output for me is:

        Source        SS  ddof1  ddof2        MS         F     p-unc  p-GG-corr       ng2       eps
0        speed  0.282911      2    398  0.141456  0.151597  0.859385   0.858779  0.000155  0.996976
1          tta  0.769135      2    398  0.384567  0.387275  0.679160   0.679027  0.000422  0.999362
2  speed * tta  3.209345      4    796  0.802336  0.775536  0.541232   0.535358  0.001758  0.950741

@LucasFKobernic can you please also let us know which version of numpy and pandas you're running?

Closing this issue but feel free to reopen