How are bounding boxes and center points computed in the training data?

Question

How are bounding boxes and center points computed in the training data?

davidparks21 opened this issue 7 years ago · 2 comments

Hi, I'm trying to reproduce your results and ran into two questions I couldn't answer from the paper or the code.

First, you compute a bounding box around the face in the training data. As I understand so far this is done using the ground truth landmarks. The bounding box dimensions are used in the paper to compute NME (normalized mean error). But I can't see how you actually compute the bounding box dimensions. In section 3.3 of the paper, the use of the landmarks to compute the bounding box is eluded to, but I can't find the relevant code: Section 3.3: "in particular we used the bounding box calculated from the 2D landmarks."

I could compute the bounding box as a tight fit to the predicted/ground truth landmarks, but I wasn't sure if that was the method applied in the paper, or whether the bounding box was was some scaling of this size, which would make sense because a tight crop of the landmarks would cut off a lot of the face.

Second, how the images were cropped and resized isn't clear. The training data has varying scales of images, and are not all centered. How are the image center points computed? Your code seems to have a fixed resolution for the center which we can't explain.

function DatasetImages:generateSampleFace(idx)
    local main_pts = torch.load(self.opt.data..'landmarks/'..self.annot[idx]:split('_')[1]..'/'..string.sub(self.annot[idx],1,#self.annot[idx]-4)..'.t7')
    local pts = main_pts[1] --- 2:3D
    local c = torch.Tensor{450/2,450/2+50}
    local s = 1.8

Thanks very much for any help you can provide.

Answer 1 · 2018-03-29T18:48:59.000Z

Hi @davidparks21 ,

We obtain a bounding box (preferable using a ground truth tight bbox).
The center is simple the center of that bounding box.
The scale is computed based on the bounding box such that on a 256x256 image the face will be roughly 180-200px height. This is done as you righly mention to avoid cropping parts of the face.

For more details you can probably have a quick look on the evaluation code from here: https://github.com/1adrianb/2D-and-3D-face-alignment were you can see how we compute the center and the scale.

Regarding the fixed resolution: The dataset used 300W-LP was already preprocessed by the original authors and as such all the faces were already scaled and centered. This whay all of them will have the same center and scale.

Answer 2 · 2018-03-29T20:32:09.000Z

Thanks for the fast response, that's very helpful and clears up the points we were unsure about. Great job on the paper and code, it has a lot of interest in both academics and commercial circles!