yuenshingyan/MissForest

ValueError: at least one array or dtype is required

khanwa opened this issue · 6 comments

Thank for sharing with us the implementation. I am having an error ValueError: at least one array or dtype is required when I run mfe= mfe.impute(data, rfc, rfr). It is working fine with I read fish = pd.read_csv('Fish.csv')

But When I read some other file it gives the error. Although my DF is fine "[699 rows x 10 columns]", Type "Dataframe". Could please check?

Hi, would you like to try this script instead ?

Import dependencies

import numpy as np
import pandas as pd
from MissForest import MissForest
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor

# Read our toy dataset
fish = pd.read_csv('Fish.csv')

# Set missing values
fish.iloc[1, 0] = np.nan
fish.iloc[155, 0] = np.nan
fish.iloc[1, 2] = np.nan
fish.iloc[155, 2] = np.nan

# Instantiate our imputator
mf = MissForest()
fish = mf.impute(x=fish, classifier=RandomForestClassifier(), regressor=RandomForestRegressor())

print(fish) 

It seems like you are setting mfe to mfe.impute(data, rfc, rfr) and the order of classifier and regressor argument is wrong.

mfe= mfe.impute(data, rfc, rfr)

Could you send me your data ? Thank you.

Thank you so much. here it is.

I fixed the bug and tried with the data you provided. If works fine so far.

from missforest.miss_forest import MissForest
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor

# Read our toy dataset
data_train=pd.read_csv('cancer_train_1.csv')#

train_label=data_train.iloc[:,-1:]

data_train.drop('class', axis=1, inplace=True)

data_testt=pd.read_csv('cancer_test_10_1.csv') #

testt_label=data_testt.iloc[:,-1:]

data_testt.drop('class', axis=1, inplace=True)#

label_all = pd.concat([train_label, testt_label], ignore_index=True)

data=pd.concat([data_train,data_testt], ignore_index=True)

print(data.isnull().sum())

# Instantiate our imputator
mf = MissForest()
data = mf.fit_transform(X=data)

print(data.isnull().sum())

a 28
b 17
c 32
d 32
e 31
f 43
g 35
h 30
i 22
dtype: int64
a 0
b 0
c 0
d 0
e 0
f 0
g 0
h 0
i 0
dtype: int64

Thank you very much.