D-X-Y/landmark-detection

Regarding the face bounding box

miracleyoo opened this issue · 6 comments

Hello, I'm a Ph.D. student working on a gamer related CV research. I'm trying to use SAN and integrate it into my project. But I find it doesn't work well with the bounding box generated by RetinaFace, a new paper published in 2019 and ranked 1st in Wider Face Datasets.
The bounding box generated by it is not square, but rectangles. So I'm wondering whether SAN can only work well with the face box in some certain datasets like WFLW? Can it work with face bounding box generated by other models?

If you are convenient, I hope you can reply ASAP. I will definitely cite your paper if the project finished with SAN. Thanks!

D-X-Y commented

Thanks for this good question. Did you re-train the SAN on your dataset based on new bounding box? If not, SAN is highly possible to perform poorly. If you are using a pre-trained SAN, you should use the same face detectors, otherwise, SAN will perform poorly due to the mismatch of the different training bounding box and evaluation bounding box.

I found the problem is that when I'm using the rectangle bounding box, SAN works even worse than the baseline dlib landmark, as the following graph shows:
image
The former is RetinaFace+SAN, the latter is dlib pack.

D-X-Y commented

@miracleyoo I see, are you using pre-trained SAN?

Thanks a lot for the quick response! Yes, I'm using the pre-trained SAN. I believe that should be the problem. I will try to retrain the SAN network based on my own dataset and RetinaFace. But another question is that I need to manually make 68-points datasets based on my videos, or just use retina face to generate and replace the original face bounding box?

D-X-Y commented

It would be better to manually annotate 68 points based on your video. But I think using retina face to replace the original face bounding box is also fine.

Thanks a lot for your precious help! I will try the second method at first.