The model works ONLY on MTCNN aligned faces
Closed this issue · 2 comments
Hi, Thanks for your work!
I found an interesting thing about the SER-FIQ model.
I used data/test_img.jpeg
for testing.
First, I cropped the output of this line and saved it as a jpg file, naming 1.jpg
Second, I used ffmpeg command ffmpeg -i 1.jpg -q:v 10 2.jpg
to decrease the quality of the image, and save it as 2.jpg
Third, I used my own face detection model (UltraFace) and landmark detection model (FAN model) to detect and align the face and saved the face image as 3.jpg
.
Fourth, I test the three images with SER-FIQ model with the following code
ser_fiq = SER_FIQ(gpu=1)
test_img_folder = './data'
face_imgs = glob.glob(os.path.join(test_img_folder, '*.jpg'))
for face_img in face_imgs:
test_img_ori = cv2.imread(face_img)
test_img = cv2.cvtColor(test_img_ori, cv2.COLOR_BGR2RGB)
aligned_img = np.transpose(test_img, (2, 0, 1))
score = ser_fiq.get_score(aligned_img, T=100)
new_path = os.path.join('outputs', str(score)+ '_' + os.path.basename(face_img))
cv2.imwrite(new_path, test_img_ori)
print("SER-FIQ quality score of image is", score)
And the results are:
1.jpg: 0.8465793745298666
2.jpg: 0.8412792795675421
3.jpg: 0.05755140244918808
As you can see, the SER-FIQ model is robust to the image quality (or image size) decreasing, however, when facing images aligned from other models (not MTCNN), the score decreases dramatically.
Have you encountered this problem and do you know why this happen?
Thanks!
Hi @taosean
thanks for your questions. The SER-FIQ method basically indicates how well a network can handle an input. This includes how often the network has seen a type of input during training. If a certain alignment method was used for training, the network can only produce safe embeddings with this alignment, other alignment methods like the one used in Figure 3, where the face takes up much more space in the image, therefore work worse than the original MTCNN, because these kinds of face images did not appear in the training data. In general, FR models are very sensitive to the preprocessing used. Using different parameters for the alignment or landmarks can reduce the performance significantly.
The ability to handle different image quality depends on the training process and the dataset used.
Also, SER-FIQ is not a model in the sense of a neural network, but the methodology can be applied to every network which is trained with Dropout.
I hope this answers your questions.
Kind regards,
Jan