keras-team/keras-preprocessing

ImageDataGenerator.flow_from_dataframe() with class_mode='binary' can be affected by classname unexpectedly

ecoopnet opened this issue · 2 comments

In binary class mode,
the class values of ImageDataGenerator.flow_from_dataframe() are affected by class name because class names are sorted automatically unexpectedly.

https://github.com/keras-team/keras-preprocessing/blob/0494094a3b/keras_preprocessing/image/dataframe_iterator.py#L252

Sorting is OK in categorical because order is not important.

But it is not OK in binary. because DataFrameIterator requires 2 classes for binary class mode.
And it generates single values depends on index of classes.

I could not determine it is bug or not, but I think sorting seems unnecessary on binary mode.
I expected classes are not sorted when I passed classes to argument.

Actual

generator = ImageDataGenerator(...)
flow = generator.flow_from_dataframe(class_mode='binary', classes=['normal', 'abnormal'])

print(flow.class_indices)
# it prints: {'abnormal': 0, 'normal': 1}

Expected

generator = ImageDataGenerator(...)
flow = generator.flow_from_dataframe(class_mode='binary', classes=['normal', 'abnormal'])

print(flow.class_indices)
# it prints: {'normal': 0, 'abnormal': 1} or {'abnormal': 1, 'normal': 0} 

My workaround:
Append indexed prefix to class name to control the order of class.

# The first name is always 0, second is 1, whatever these name are.
classes=['normal', 'abnormal']

# add prefix to classes.
# also you need to add same prefix to y_cal's values of dataframe.
indexed_classes = list ( map(lambda x: "{:0>3}_{}".format(x[0], x[1]), enumerate(classes)) )
flow = generator.flow_from_dataframe(class_mode='binary', classes=indexed_classes)

print(flow.class_indices)
# it prints: {'000_normal': 0, '001_abnormal': 1}

I think this is a bug, PRs are welcome.