nasaharvest/cropharvest

-1 in test data

Tim1702 opened this issue · 2 comments

Hi, I am trying to use the Kenya maize dataset. However, I found that for the test data, there are large numbers of -1 values for y. But for the training data, there are only binary values which are 0 or 1. What does the -1 values mean?

evaluation_datasets = CropHarvest.create_benchmark_datasets(DATA_DIR)
kenya_dataset = evaluation_datasets[0]
X, y = kenya_dataset.as_array(flatten_x=True)

instances = []
for _, test_instance in kenya_dataset.test_data(flatten_x=True):
    instances.append(test_instance)

print(numpy.count_nonzero(instances[1][0:-1].y == -1))     #7517   <--- ?
print(numpy.count_nonzero(instances[1][0:-1].y == 0))      #76
print(numpy.count_nonzero(instances[1][0:-1].y == 1))      #284

Hi @Tim1702 !

-1 represents missing data. We ignore it during evaluation.

Understood now. Thank you very much!