ImageDataGenerator.flow_from_dataframe() with class_mode='binary' can be affected by classname unexpectedly
ecoopnet opened this issue · 2 comments
ecoopnet commented
In binary class mode,
the class values of ImageDataGenerator.flow_from_dataframe() are affected by class name because class names are sorted automatically unexpectedly.
Sorting is OK in categorical because order is not important.
But it is not OK in binary. because DataFrameIterator requires 2 classes for binary class mode.
And it generates single values depends on index of classes.
I could not determine it is bug or not, but I think sorting seems unnecessary on binary mode.
I expected classes are not sorted when I passed classes to argument.
Actual
generator = ImageDataGenerator(...)
flow = generator.flow_from_dataframe(class_mode='binary', classes=['normal', 'abnormal'])
print(flow.class_indices)
# it prints: {'abnormal': 0, 'normal': 1}
Expected
generator = ImageDataGenerator(...)
flow = generator.flow_from_dataframe(class_mode='binary', classes=['normal', 'abnormal'])
print(flow.class_indices)
# it prints: {'normal': 0, 'abnormal': 1} or {'abnormal': 1, 'normal': 0}
ecoopnet commented
My workaround:
Append indexed prefix to class name to control the order of class.
# The first name is always 0, second is 1, whatever these name are.
classes=['normal', 'abnormal']
# add prefix to classes.
# also you need to add same prefix to y_cal's values of dataframe.
indexed_classes = list ( map(lambda x: "{:0>3}_{}".format(x[0], x[1]), enumerate(classes)) )
flow = generator.flow_from_dataframe(class_mode='binary', classes=indexed_classes)
print(flow.class_indices)
# it prints: {'000_normal': 0, '001_abnormal': 1}
Dref360 commented
I think this is a bug, PRs are welcome.