CSAILVision/GazeCapture

Unable to reproduce the result

ankitw497 opened this issue · 5 comments

I am using the given pretrained caffe model but getting a euclidean loss of much more than mentioned in the paper.Please look into my code and tell me where I am making mistake.
loading caffe model ,doing a forward pass to get the output.
CaffeModel.zip

Here is the piece of code

import caffe
import numpy as np
import csv
import cv2
from PIL import Image
from scipy.io import loadmat

def Predict(tmp_dict): #load the dictionary

caffe.set_mode_cpu()

MODEL_FILE = "/mnt/disks/d/mitgaze/code/Model/itracker_deploy.prototxt"
PRETRAINED = "/mnt/disks/d/mitgaze/code/Model/itracker_iter_92000.caffemodel"

#load the mean Image
image_mean_face  = loadmat("/mnt/disks/d/mitgaze/code/Model/mean_face_224.mat", squeeze_me=True, struct_as_record=False)['image_mean']
image_mean_left  = loadmat("/mnt/disks/d/mitgaze/code/Model/mean_left_224.mat", squeeze_me=True, struct_as_record=False)['image_mean']
image_mean_right = loadmat("/mnt/disks/d/mitgaze/code/Model/mean_right_224.mat", squeeze_me=True, struct_as_record=False)['image_mean']


net = caffe.Classifier(MODEL_FILE, PRETRAINED)
list_of_score=[]
with open("/mnt/disks/d/mitgaze/code/Result/loss_withMeanSubs.txt","w") as fout:
    for sample in tmp_dict:
        
        facegridpath = list(tmp_dict[sample]['facegrid'].items())
        FaceGrid=np.loadtxt(facegridpath[0][1])

        dotInfopath = list(tmp_dict[sample]['dotInfo'].items())
        dotInfo=np.loadtxt( dotInfopath[0][1])

        EucledianLoss=np.zeros((len(dotInfo),1));
        Xpre=np.zeros((len(dotInfo),1));
        Ypre=np.zeros((len(dotInfo),1));
        Xtrue=np.zeros((len(dotInfo),1));
        Ytrue=np.zeros((len(dotInfo),1));

    for image in tmp_dict[sample]:
        
        if 'facegrid'!=image and 'dotInfo'!=image:
            
            appleFace,appleLeftEye,appleRightEye = tmp_dict[sample][image].items()

            IMAGE_FILE1=appleLeftEye[1]
            IMAGE_FILE2=appleRightEye[1]
            IMAGE_FILE3=appleFace[1]

          
            image_left = Substract_mean(IMAGE_FILE1,image_mean_left)
            image_right= Substract_mean(IMAGE_FILE2,image_mean_right)
            image_face = Substract_mean(IMAGE_FILE3,image_mean_face)
            facegrid   = np.reshape(FaceGrid[int(image)]  ,(1,625,1,1))

        
            net.blobs["image_left"].data[...]= image_left
            net.blobs["image_right"].data[...]= image_right
            net.blobs["image_face"].data[...]= image_face
            net.blobs["facegrid"].data[...]= facegrid
            
            pred = net.forward();
            Xpre[int(image)][0]= pred['fc3'][0][0];
            Ypre[int(image)][0]= pred['fc3'][0][1];

            Xtrue[int(image)][0]=dotInfo[int(image)][0];
            Ytrue[int(image)][0]=dotInfo[int(image)][1];

            EucledianLoss[int(image)][0]=Loss(Xtrue[int(image)][0],Xpre[int(image)][0],Ytrue[int(image)][0],Ypre[int(image)][0],);
            fout.write(sample+","+image+","+str(Xtrue[int(image)][0])+","+str(Xpre[int(image)][0])+","+str(Ytrue[int(image)][0])+","+str(Ypre[int(image)][0])+","+str(EucledianLoss[int(image)][0])+"\n")
            
Averageerorr=Average(EucledianLoss,len(image)-2)
return(EucledieanLoss)


# Accuracy

def Loss( X_t,X_p ,Y_t,Y_p):
Loss=math.sqrt((X_t-X_p)**2 + (Y_t-Y_p)**2)

return Loss

# Average Error

def Average(EucledianLoss,NumofImages):
Average=((np.sum(EucledianLoss))/NumofImages)

return Average

#Mean Substraction
# substract the mean

def load_image( filename ):
img = Image.open( filename )
img.load()
data = np.asarray( img, dtype="float32" )
# print("input",data.shape)
return data

def Substract_mean( filename,mean_image_array):
img=load_image(filename)
Substract_img=(img - mean_image_array)/255
Substract_img=np.reshape(Substract_img,(1,3,224,224))
#print("subtract",Substract_img.shape)
return Substract_img`

Hi, I do not really use caffe anymore so I cannot test your code. You can either try to use the Pytorch version or debug your solution by checking that all data have correct range (0-255 -> 0-1) when subtracting mean etc. Also your error formula may be wrong because you are averaging square distances and rooting at the end which is not a linear mean of the errors. Try rooting the values before accumulation. Should make a difference if the variance is large.

Hi Petr,
I have the same question: what is the error reported in Table 2 in the paper. For instance, the real label is (x = 0, y = 0), and the prediction is (x = 1, y = 1). What is the error? Is the error = MSELoss(real, pred) = 1, or error = sqrt(2) = 1.414.
I thought the error should be 1.414, but the PyTorch code seems to count the error as 1.
I rewrite a function to calculate the evaluation metric (Euclidean distance on screen) and then re-evaluate the provided checkpoint. The result is about 2.4 cm on-screen error, which is slightly worse than the reported result. I guess my understanding is right, but the PyTorch code only evaluate the L2 error.

Thanks!

Hi, check that the error that you compute is square rooted before being averaged.

Also, if the error vector is [1,1] then the correct error is 1.41. Please note that there is a difference between the training loss (which is just L2 without rooting) and the accuracy (as reported in the paper) which is being rooted (= Euclidean distance).