
ImageDataGenerator.flow_from_dataframe() with class_mode='binary' can be affected by classname unexpectedly

ecoopnet opened this issue · 2 comments

In binary class mode,
the class values of ImageDataGenerator.flow_from_dataframe() are affected by class name because class names are sorted automatically unexpectedly.

Sorting is OK in categorical because order is not important.

But it is not OK in binary. because DataFrameIterator requires 2 classes for binary class mode.
And it generates single values depends on index of classes.

I could not determine it is bug or not, but I think sorting seems unnecessary on binary mode.
I expected classes are not sorted when I passed classes to argument.


generator = ImageDataGenerator(...)
flow = generator.flow_from_dataframe(class_mode='binary', classes=['normal', 'abnormal'])

# it prints: {'abnormal': 0, 'normal': 1}


generator = ImageDataGenerator(...)
flow = generator.flow_from_dataframe(class_mode='binary', classes=['normal', 'abnormal'])

# it prints: {'normal': 0, 'abnormal': 1} or {'abnormal': 1, 'normal': 0} 

My workaround:
Append indexed prefix to class name to control the order of class.

# The first name is always 0, second is 1, whatever these name are.
classes=['normal', 'abnormal']

# add prefix to classes.
# also you need to add same prefix to y_cal's values of dataframe.
indexed_classes = list ( map(lambda x: "{:0>3}_{}".format(x[0], x[1]), enumerate(classes)) )
flow = generator.flow_from_dataframe(class_mode='binary', classes=indexed_classes)

# it prints: {'000_normal': 0, '001_abnormal': 1}

I think this is a bug, PRs are welcome.