ishay2b/VanillaCNN

68 landmark process

Opened this issue · 6 comments

Is there a convenient way to process the landmark prediction for 68 points? I looked at your code and it seems that the 5 landmarks are represented by lefteye, righteye, left mouth,right mouth and middle. Since I am not vey familiar with python, it seems a bit hard to construct a loop structure with these naming conventions.

caffe actually returns a numpy array of 10 floats, i used a structure naming for convince but this is not a must as your network should return 136 floats without any need for naming.

Notice for example mainLoop.py: testErrorMini, i just rescale the numpy array as is.


    for i, dataRow in enumerate(dataRowsTrainValid):
        dataRow40 = dataRow.copyCroppedByBBox(dataRow.fbbox).copyMirrored()
        image, lm_0_5 = predictor.preprocess(dataRow40.image, dataRow40.landmarks())
        prediction = predictor.predict(image) # This is a numpy array
        dataRow40.prediction = (prediction+0.5)*40.  # Scale -0.5..+0.5 to 0..40

Ishay, thanks for your quick response. For the test process, there should be no problem. But for training 68 landmark, there is a class "DataRow" definition which defines lefteye, righteye, leftmouth, right mouth and middle. These definitions correspond to the 5 landmarks. If I have 68 landmarks, what would be a convenient way to process this? I saw another implementation where the author defined 15 names for 30 key points process. I am sure there is a bette way?

I suggest you throw away all naming structure and replace it with numpy array operators. There is no need for the naming. I will be happy to accept this PR since this is the right way to go and to scale this project.
The issue is you will have to handle the indexes for mapping things out.

Actually I answered only to the technical programming issue you rose, but Tal Hassner enlightened me that the real problem is that vanilla CNN does not perform well with 68 points as there is a lot less training data publicly available for this task (images with 68 point labels). So instead use the predictions to initialize the CLNF detector for which code is available. You can see the paper for more details: http://www.openu.ac.il/home/hassner/projects/tcnn_landmarks/

Indeed, there will be a problem with 68-landmark face alignment if the required dataset is not sufficient. As written in the paper, this is also a first-stage landmark detection, do you have any future plans to release the code for later stage?

@fishman2008 Have you finished 68 lamdmarks process?