pterhoer/FaceImageQuality

The model works ONLY on MTCNN aligned faces

taosean opened this issue · 2 comments

Hi, Thanks for your work!

I found an interesting thing about the SER-FIQ model.
I used data/test_img.jpeg for testing.

First, I cropped the output of this line and saved it as a jpg file, naming 1.jpg
test_img_3

Second, I used ffmpeg command ffmpeg -i 1.jpg -q:v 10 2.jpg to decrease the quality of the image, and save it as 2.jpg
test_img_4

Third, I used my own face detection model (UltraFace) and landmark detection model (FAN model) to detect and align the face and saved the face image as 3.jpg.
test_img_face

Fourth, I test the three images with SER-FIQ model with the following code

    ser_fiq = SER_FIQ(gpu=1)

    test_img_folder = './data'

    face_imgs = glob.glob(os.path.join(test_img_folder, '*.jpg'))
    for face_img in face_imgs:
        test_img_ori = cv2.imread(face_img)
        test_img = cv2.cvtColor(test_img_ori, cv2.COLOR_BGR2RGB)
        aligned_img = np.transpose(test_img, (2, 0, 1))

        score = ser_fiq.get_score(aligned_img, T=100)
        new_path = os.path.join('outputs', str(score)+ '_' + os.path.basename(face_img))

        cv2.imwrite(new_path, test_img_ori)

        print("SER-FIQ quality score of image is", score)

And the results are:

1.jpg: 0.8465793745298666
2.jpg: 0.8412792795675421
3.jpg: 0.05755140244918808

As you can see, the SER-FIQ model is robust to the image quality (or image size) decreasing, however, when facing images aligned from other models (not MTCNN), the score decreases dramatically.

Have you encountered this problem and do you know why this happen?

Thanks!

Hi @taosean
thanks for your questions. The SER-FIQ method basically indicates how well a network can handle an input. This includes how often the network has seen a type of input during training. If a certain alignment method was used for training, the network can only produce safe embeddings with this alignment, other alignment methods like the one used in Figure 3, where the face takes up much more space in the image, therefore work worse than the original MTCNN, because these kinds of face images did not appear in the training data. In general, FR models are very sensitive to the preprocessing used. Using different parameters for the alignment or landmarks can reduce the performance significantly.
The ability to handle different image quality depends on the training process and the dataset used.
Also, SER-FIQ is not a model in the sense of a neural network, but the methodology can be applied to every network which is trained with Dropout.
I hope this answers your questions.

Kind regards,
Jan

Hi, @jankolf , thank you for your reply.

I understand your point.

Thanks.