dask/dask-ml

sklearn StandardScaler vs dask StandardScaler.

Arunes007 opened this issue · 1 comments

I am getting different results from sklearn StandardScaler and dask StandardScaler.

scaler_sk = sklearn.preprocessing.StandardScaler()
scaler_d = dask_ml.preprocessing.StandardScaler()

scaler_sk.fit(df_pd[["SUMMESSAGECOUNT"]])
scaler_d.fit(df_dask[["SUMMESSAGECOUNT"]])

Dask scaler

scaler_d.mean_[0], scaler_d.var_[0]
output: (19.157653421114507, 47431.17794342375)

Sklearn Scaler

scaler_sk.mean_[0], scaler_sk.var_[0]
output: (19.157653421114507, 47431.17794342373)

I know the difference is negligible. But it is influencing my model training on prophet. Could you please suggest any way to make them identical without using compute().