dataquestio/project-walkthroughs

Getting error ValueError: Input X contains NaN. SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

GlamarSK opened this issue · 3 comments

At line model.fit(train[predictors], train["Target"])
ValueError: Input X contains NaN.
SimpleImputer does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Then i try to do the following, but not able to resolve this,

Create our imputer to replace missing values with the mean e.g.

imp = SimpleImputer(missing_values=0, strategy='mean')
imp = imp.fit(train)

Impute our data, then train

X_train_imp = imp.transform(train)

model.fit(X_train_imp[predictors], X_train_imp["Target"])

Please share the solution

I had the same issue,
So I removed the null values and it worked,
PS: I know it's not best practice, but it worked, one solution is this, or we can also replace Nan using either mean or median.

I had the same issue.
I used imp = SimpleImputer(missing_values=np.nan, strategy='mean') and it worked

you should not set missing_values=0 or something else
just imp = SimpleImputer(strategy='mean') may be ok