EpistasisLab/scikit-mdr

Error when class labels aren't [0,1]

hayleyson opened this issue · 0 comments

An error occurs when the class labels are not 0 and 1.
When counting the number of cases and controls for each cell in a grid,

MDR code uses the following code (line 76-78):

       for row_i in range(features.shape[0]):
            feature_instance = tuple(features[row_i])
            self.class_count_matrix[feature_instance][classes[row_i]] += 1

classes is an array of y values passed as a parameter.
Think of class_count_matrix as a (# of possible feature_instances) by (# of classes).
Then since MDR takes in only binary data, # of classes is always 2 and therefore appropriate indices would be 0 and 1 for the dimension.
But if the class labels are 0 and 2 not 0 and 1, then the program will try to index the class_count_matrix as class_count_matrix[(tuple of a feature_instance)][2], which is out of bounds.

Error message:

  File "<ipython-input-180-e1715a88facf>", line 10, in <module>
    mdr.fit(X_train, y_train)

  File "C:\Users\Hayley Son\Anaconda3\lib\site-packages\mdr\mdr.py", line 78, in fit
    self.class_count_matrix[feature_instance][classes[row_i]] += 1

IndexError: index 2 is out of bounds for axis 0 with size 2