BUG: Potentially incorrect result of `df.rolling(window=...).mean()`

Question

BUG: Potentially incorrect result of `df.rolling(window=...).mean()`

Closed this issue 24 days ago · 2 comments

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd

if __name__ == '__main__':    
    values = [
        -3.333333e-01,
        0.,
        1.000000e+15,
        -4.874588e+14,
        6.103516e-05,
        0.,
        1.100000e+00,
        0.,
        0.,
        -8.581237e+14,
        0.,
        1.000000e+15,
        0.,
        0.,
        0.,
        0.,
        0.,
        -5.96e-03,
        0.,
        0.,
    ]

    s = pd.Series(values, dtype=np.float64)
    window = 8

    ground_truth = np.sum(values[-window:]) / window
    rolling_mean_1 = s.rolling(window=window).mean()
    rolling_mean_2 = s.iloc[1:].rolling(window=window).mean()

    # >>> ground_truth
    # np.float64(-0.000745)

    # >>> rolling_mean_1.iloc[-1]
    # np.float64(-0.01637)
    
    # >>> rolling_mean_2.iloc[-1]
    # np.float64(-0.000745)

    # The first number is ~22 times greater than both the last one and the ground_truth value, which is significant.
    # >>> rolling_mean_1.iloc[-1] / rolling_mean_2.iloc[-1]
    # np.float64(21.973154362416107)
    # >>> rolling_mean_1.iloc[-1] / ground_truth
    # np.float64(21.973154362416107)
    
    assert np.allclose(rolling_mean_2.iloc[-1], ground_truth)  # passes
    assert np.allclose(rolling_mean_1.iloc[-1], ground_truth, rtol=0, atol=1e-2)  # fails

Issue Description

While using pandas' .rolling(window=...).mean(), there is an unexpected discrepancy when comparing two different ways of computing the rolling mean:

rolling_mean_1 = s.rolling(window=window).mean()
rolling_mean_2 = s.iloc[1:].rolling(window=window).mean()

The discrepancy is significant, as rolling_mean_1.iloc[-1] is approximately 22 times greater than rolling_mean_2.iloc[-1], even though the values required to compute the result for the last row are the same in both cases. Additionally, the reference value ground_truth is almost identical to rolling_mean_2.iloc[-1].

Likely, it's related to rounding errors, but I don't understand how they could cause such a large discrepancy in this case, given that the absolute values of the original data are not that large (<= 1e15)

Expected Behavior

assert np.allclose(rolling_mean_1.iloc[-1], ground_truth) should pass

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.12.8
python-bits : 64
OS : Darwin
OS-release : 23.4.0
Version : Darwin Kernel Version 23.4.0: Wed Feb 21 21:44:54 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T6031
machine : arm64
processor : arm
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 2.2.2
pytz : 2025.1
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : 6.125.2
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : 8.3.4
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.15.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.1
qtpy : None
pyqt5 : None

Answer 1 · 2025-03-06T21:43:42.000Z

Thanks for the report. Here are the internal number of observations and summations along with the result within pandas.

# rolling_mean_1
1 -0.3333333 nan
2 -0.3333333 nan
3 999999999999999.6 nan
4 512541199999999.6 nan
5 512541199999999.6 nan
6 512541199999999.6 nan
7 512541200000000.75 nan
8 512541200000000.75 64067650000000.09
8 512541200000001.06 64067650000000.13
8 -345582499999998.94 -43197812499999.87
8 -1345582499999999.0 -168197812499999.88
8 141876300000001.0 17734537500000.125
8 141876300000001.0 17734537500000.125
8 141876300000001.0 17734537500000.125
8 141876299999999.9 17734537499999.99
8 141876299999999.9 17734537499999.99
8 141876299999999.9 17734537499999.99
8 999999999999999.9 124999999999999.98
8 999999999999999.9 124999999999999.98
8 -0.13096 -0.01637

# rolling_mean_2
1 0.0 nan
2 1000000000000000.0 nan
3 512541200000000.0 nan
4 512541200000000.0 nan
5 512541200000000.0 nan
6 512541200000001.1 nan
7 512541200000001.1 nan
8 512541200000001.1 64067650000000.14
8 -345582499999998.9 -43197812499999.86
8 -1345582499999999.0 -168197812499999.88
8 141876300000001.12 17734537500000.14
8 141876300000001.12 17734537500000.14
8 141876300000001.12 17734537500000.14
8 141876300000000.03 17734537500000.004
8 141876300000000.03 17734537500000.004
8 141876300000000.03 17734537500000.004
8 1000000000000000.0 125000000000000.0
8 1000000000000000.0 125000000000000.0
8 -0.00596 -0.000745

It appears to me this is normal floating point errors.

Answer 2 · 2025-03-07T09:23:58.000Z

Thank you for your answer! I hadn’t noticed that the absolute difference is roughly the same as that between other values from both rolling mean series. As the values become smaller, the accumulated error starts to play a significant role, as the relative difference becomes significant, which is somewhat expected.