-1 in test data
Tim1702 opened this issue · 2 comments
Tim1702 commented
Hi, I am trying to use the Kenya maize dataset. However, I found that for the test data, there are large numbers of -1 values for y. But for the training data, there are only binary values which are 0 or 1. What does the -1 values mean?
evaluation_datasets = CropHarvest.create_benchmark_datasets(DATA_DIR)
kenya_dataset = evaluation_datasets[0]
X, y = kenya_dataset.as_array(flatten_x=True)
instances = []
for _, test_instance in kenya_dataset.test_data(flatten_x=True):
instances.append(test_instance)
print(numpy.count_nonzero(instances[1][0:-1].y == -1)) #7517 <--- ?
print(numpy.count_nonzero(instances[1][0:-1].y == 0)) #76
print(numpy.count_nonzero(instances[1][0:-1].y == 1)) #284
gabrieltseng commented
Hi @Tim1702 !
-1 represents missing data. We ignore it during evaluation.
Tim1702 commented
Understood now. Thank you very much!