sklearn StandardScaler vs dask StandardScaler.
Arunes007 opened this issue · 1 comments
Arunes007 commented
I am getting different results from sklearn StandardScaler and dask StandardScaler.
scaler_sk = sklearn.preprocessing.StandardScaler()
scaler_d = dask_ml.preprocessing.StandardScaler()
scaler_sk.fit(df_pd[["SUMMESSAGECOUNT"]])
scaler_d.fit(df_dask[["SUMMESSAGECOUNT"]])
Dask scaler
scaler_d.mean_[0], scaler_d.var_[0]
output: (19.157653421114507, 47431.17794342375)
Sklearn Scaler
scaler_sk.mean_[0], scaler_sk.var_[0]
output: (19.157653421114507, 47431.17794342373)
I know the difference is negligible. But it is influencing my model training on prophet. Could you please suggest any way to make them identical without using compute()
.
TomAugspurger commented
I *think* that floating point inaccuracies are just a fact of life when you’re doing things in chunks, at least with the algorithms that dask.array uses today. I don’t think there’s anything we can do in dask-ml to address that (but maybe check the source to be sure).
… On Dec 1, 2023, at 5:35 AM, Arunesh Singh ***@***.***> wrote:
I am getting different results from sklearn StandardScaler and dask StandardScaler.
scaler_sk = sklearn.preprocessing.StandardScaler()
scaler_d = dask_ml.preprocessing.StandardScaler()
scaler_sk.fit(df_pd[["SUMMESSAGECOUNT"]])
scaler_d.fit(df_dask[["SUMMESSAGECOUNT"]])
Dask scaler
scaler_d.mean_[0], scaler_d.var_[0]
output: (19.157653421114507, 47431.17794342375)
Sklearn Scaler
scaler_sk.mean_[0], scaler_sk.var_[0]
output: (19.157653421114507, 47431.17794342373)
I know the difference is negligible. But it is influencing my model training on prophet. Could you please suggest any way to make them identical without using compute().
—
Reply to this email directly, view it on GitHub <#979> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIQLOIVBEFL4GC2IBMLYHG6G5BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJLJONZXKZNENZQW2ZNLORUHEZLBMRPXI6LQMWBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTLDTOVRGUZLDORPXI6LQMWSUS43TOVS2M5DPOBUWG44SQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKBZGQ2DKNJXGQ2YFJDUPFYGLJLJONZXKZNFOZQWY5LFVIZDAMRQG4YDCNRYGKTXI4TJM5TWK4VGMNZGKYLUMU>.
You are receiving this email because you are subscribed to this thread.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.