We assume imbalanced dataset S with m objects |S| = m, as a pair S = {(xi, yi)}, i = 1, ..., m, where xi ϵ X denotes the instance in the n-dimensional feature space X={f1, f2, ..., fn}, and yi ϵ Y = {1, …, C} denotes the class identifier label associated with xi. in particular, when C = 2 we deal with binary classification task. In addition, we define subsets Smin ⊂ S and Smaj ⊂ S, where Smin is the set of minority objects of classes in S, and Smaj is the set of objects of prevailing classes in S, so that Smin ∩ Smaj = {Φ} and Smin ᴜ Smaj = {S}.
Thereafter, the multiclass classification problem is transformed into a regression task. This transformation helps to reduce possible subjectivity in the assessment of the acne severity at the stage of data annotation. Since the acne levels class have a meaningful ordering, we assign the numeric values from 0 to C to each of the corresponding C classes labels. Formally, it presented as follow y ⊆ S = {1, …, C}→transformation→y ϵ ℝn. The values of the class variable are transformed into numbers. A numeric target is corresponding to a level of pathology severity.
Thresholding is also used to transform back a regression prediction into a classification prediction y' ϵ ℝn→transformation→y' ⊆ S, where y' is a prediction.
The proposed methodology includes dealing with imbalanced images dataset at the data level and feature level. At the data level, the following methods have been used: patch extraction and data augmentation. At the feature level, an oversampling technique has been used to deal with imbalanced data.
Firstly, we extract patches from original images of human faces using one of the two pre-trained models. Then data augmentation, feature extraction, and data oversampling are conducted. Finally, results obtained after oversampling are fed to CNN for model training and evaluation.
At the first stage, preliminary processing of images is carried out to balance input data. The result of this step is a dataset S consisting of m patches extracted from facial images. Each patch inherits the image class label.
CNNs are spatially sensitive, which leads to insufficient recognition quality when using a limited number of images for network training. To overcome this issue, we use translation of an image section. The result of this step is a dataset S consisting of an increased number of patches. Each translated patch inherits the class label of the original patch that was used for augmentation.
We utilised the transferring learning paradigm and a pretrained ResNet-152 to extract features from training set of images. The result of this step is a dataset S = {(xi, yi)} consisting of the features extracted using the trained model, where xi is the vector of extracted features of the patch mi, and yi is the class label which denotes the pathology severity associated with xi.
We use oversampling to balance the number of dataset objects for each class. Formally, oversampling can be represented as follows. Any objects generated from the dataset S are denoted as E, with disjoint subsets of Emin and Emaj representing the minority and majority of the E objects, respectively, whenever they are applied. The random oversampling process is implemented by adding a set E selected from the minority class: for a set of randomly selected minority examples in Smin, increase the original set S by replicating the selected examples and adding them to S. Thus, the number of typical examples in Smin increases by |E|, and the balance of the class distribution of S is adjusted accordingly. This provides a mechanism for changing the degree of balance in the distribution of classes to any desired level. The result of this step is a dataset S = {(xi, yi) consisting of extracted and generated features, where xi is a vector of extracted and generated features of patches mi, and yi is the class label associated with xi.
The extracted and generated features are passed to train a CNN model to classify pathology severity. Model evaluation is implemented on validation data.
For this study the ACNE04 the open dataset was used. The ACNE04 includes 1457 face images and expert annotations according to the Japanese rating scale. The dataset has the following acne severity annotations: level 0 – Mild, level 1 – Moderate, level 2 – Severe, level 3 – Very severe. All images were taken at an angle of approximately 70 degrees from the front of the patient and manually annotated by experts. A study by Microsoft was used as a benchmark for the experiment. Steps 1, 2 have been implemented using the source code that has been developed for the collaborative project between Microsoft and Nestle Skin Health. Our modification of Steps 3 is presented in Steps 3.ipynb code. Step 1 utilizes two pre-trained models: shape_predictor_68_face_landmarks model or the One Eye model. Sliding translation as augmentation technique has been used. Futher the feature extraction from each patch is carried out using the ResNet-152 model. Data oversampling has been conducted with Synthetic Minority Oversampling Technique (SMOTE). Data generated at the oversampling stage are used to train a CNN model.
If you find this work helpful, please cite it as "Biloborodova, T., Skarga-Bandurova, I., Koverha, M., Skarha-Bandurov, I. and Yevsieieva, Y., 2021. A Learning Framework for Medical Image-Based Intelligent Diagnosis from Imbalanced Datasets. In Applying the FAIR Principles to Accelerate Health Research in Europe in the Post COVID-19 Era (pp. 13-17). IOS Press."