-1 in test data

Question

-1 in test data

Tim1702 opened this issue 2 years ago · 2 comments

Hi, I am trying to use the Kenya maize dataset. However, I found that for the test data, there are large numbers of -1 values for y. But for the training data, there are only binary values which are 0 or 1. What does the -1 values mean?

evaluation_datasets = CropHarvest.create_benchmark_datasets(DATA_DIR)
kenya_dataset = evaluation_datasets[0]
X, y = kenya_dataset.as_array(flatten_x=True)

instances = []
for _, test_instance in kenya_dataset.test_data(flatten_x=True):
    instances.append(test_instance)

print(numpy.count_nonzero(instances[1][0:-1].y == -1))     #7517   <--- ?
print(numpy.count_nonzero(instances[1][0:-1].y == 0))      #76
print(numpy.count_nonzero(instances[1][0:-1].y == 1))      #284

Answer 1 · 2022-11-02T17:07:50.000Z

Hi @Tim1702 !

-1 represents missing data. We ignore it during evaluation.

Answer 2 · 2022-11-08T08:39:43.000Z

Understood now. Thank you very much!