CSAILVision/GazeCapture

Relative change in the results does not follow relative change in eye movement

kairikibear opened this issue · 5 comments

Hello! Thanks for your hard work!

Recently i've been trying to use the caffe model but since i want to try to use this fully on CPU anyways and i dont want to try to install caffe as a first option, i opted to load the caffe model through openCV's DNN module instead.

I cant see how the caffe model is being used in the repo so i tried to implement the sampe pipeline as the pytorch one. But unfortunately, i have met with strange results..

i stared straight at the camera, but it's way off (6 cm horizontal and 2 cm vertical). I tried looking left and right but seemingly, they yield no effect. The values does not vary according to the general direction of my eyes (as in, the relative changes), so i want to ask if my pipeline is correct?

  1. Get face
  2. Get eyes
  3. crop face and eyes and create grid (grid is all 1's where the face is within the grid)
  4. I resized the face and eyes so they are warped
  5. divide face and eyes images by 255
  6. load the means, and divide them by 255
  7. substract the means from their respective images
  8. resize them and shove them inside the model
  9. get results

Also i used some assumptions i observed and i confirmed from the pytorch code

  1. RGB image
  2. Right eye is the eye detected on the left and not the right side of the image

As a background, im using it on my laptop but i noticed this repo succeeds with usage not in a mobile device so what im asking is, what is the expected face distance from the camera?

Thanks!

Hi, I do not remember all details anymore but your algorithm makes sense to me.

  • You should check that both the images and the means have 8bit range (0-255) before you divide by 255 to be sure.
  • I think also need to provide the 25x25 mask for spatial location of the face inside the original image. If you do not have it you can probably just fake some frame centered rectangle of reasonable size.
  • Yes, it is RGB.
  • I am not sure about the eye order, just try both.
  • The expected distance is the distance at which people typically hold a phone or a tablet. Like 40 cm maybe?

I do not have experience with the OpenCV DNN framework but if possible I would look at intermediate activation and check if the numbers look reasonable (like no huge numbers or basically zero numbers). It would mean the model is not loaded well or the inputs have bad range - though this should be easy to check other ways.

Hello petr, thanks for the reply. In the end, the major culprit was the quality of my webcam. It worked nicely on a video i took using my phone camera.

In the end, this was probably what i observed that helps :

  1. Using the proper webcam quality + maintain proper usage distance
  2. extending the eyes and face bboxes to a square helps alleviate some errors

Thank you so much for the work and reply! I'll be closing this issue since it's resolved.

I am glad you figured it out and thank you for sharing your insights for others. You are right, all our training data come from iPhone/iPads with reasonably good cameras. And, as you say, the crops should be square (as in pixel units, therefore not distorting the image during resizing).

Hello @kairikibear, I wanted to ask how did you crop the face and eyes? did you use face landmarks?

Yes, using the face landmark you can crop the face and eyes. FYI: you can use dlib or mediapipe to find the face landmarks and crop the part you need.