Evaluation

Question

Evaluation

Opened this issue 3 years ago · 12 comments

Hi

I have a question about evaluation.
I implemented the evaluation code for the evaluation split of the PROX dataset.
In Table 1 of the paper(Ours w/o opt), translation, orientation, and pose errors are reported as 6.91, 9,71, and 41.17, respectively.
The results of the code I implemented are 59.02, 60.23 and 1459.95 respectively.
What's wrong with my implementation?

import os
import random
import torch
import numpy as np

from route_data import ROUTEDATAEVAL
from route_data import ROUTEDATA
from route import ROUTENET
from pose_after_route import POSEAFTERROUTE
from utils import GeometryTransformer
from utils import AverageMeter
from progress.bar import Bar

def l1_error(prediction, target):
    return torch.abs(prediction - target).sum(dim=1).mean().item()


SEED_VALUE = 0
print(f'Seed value for the experiment {SEED_VALUE}')
os.environ['PYTHONHASHSEED'] = str(SEED_VALUE)
random.seed(SEED_VALUE)
torch.manual_seed(SEED_VALUE)
np.random.seed(SEED_VALUE)


batch_size = 16

dataset = ROUTEDATAEVAL()
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)


model_route = ROUTENET(input_dim=9,hid_dim=64)
model_route = model_route.cuda()

print('use pretrained routenet')
model_route.load_state_dict(torch.load('saved_model/route.model'))
model_route.eval()

model_pose = POSEAFTERROUTE(input_dim=65-9,hid_dim=256)
model_pose = model_pose.cuda()

print('use pretrained posenet')
model_pose.load_state_dict(torch.load('saved_model/pose_after_route.model'))
model_pose.eval()

error_translation_ = AverageMeter()
error_orientation_ = AverageMeter()
error_pose_ = AverageMeter()

num_iter = len(dataloader)
bar = Bar('==>', max=num_iter)

for j, data in enumerate(dataloader, 0):

    input_list, middle_list, frame_name, scene_name, sdf, scene_points, cam_extrinsic, s_grid_min, s_grid_max = data

    input_list = input_list[:, [0, -1], :]
    body = middle_list[:, 0:1, 6:16].cuda()
    input_list = torch.cat([input_list[:, :, :6], input_list[:, :, 16:]], dim=2)
    middle_list = torch.cat([middle_list[:, :, :6], middle_list[:, :, 16:]], dim=2)

    scene_points = scene_points.cuda()

    input_list = input_list.view(-1, 62)
    six_d_input_list = GeometryTransformer.convert_to_6D_rot(input_list)
    six_d_input_list = six_d_input_list.view(-1, 2, 65)
    x = six_d_input_list.cuda()
    x1 = six_d_input_list[:, :, :9].cuda()

    middle_list = middle_list.view(-1, 62)
    six_d_middle_list = GeometryTransformer.convert_to_6D_rot(middle_list)
    six_d_middle_list = six_d_middle_list.view(-1, 60, 65)  # 60: 2s 30fps

    y = six_d_middle_list[:, :, :9].cuda()

    out_route = model_route(x1, scene_points.transpose(1, 2))

    pred_trans = out_route[: , :, :3]
    pred_6d = out_route[:, :, 3:]

    gt_trans = y[:, :, :3]
    gt_6d = y[:, :, 3:]

    route_prediction = out_route.detach().view(x1.shape[0],-1)

    out_pose = model_pose(x[:,:,9:],scene_points.transpose(1,2),route_prediction)

    y = six_d_middle_list[:, :, 9:].cuda()

    pred_body_pose = out_pose[:, :, :32]
    gt_body_pose = y[:, :, :32]

    error_translation = l1_error(pred_trans.reshape(-1,3).detach(), gt_trans.reshape(-1,3)) * 100.
    error_orientation = l1_error(pred_6d.reshape(-1,6).detach(), gt_6d.reshape(-1,6)) * 100.
    error_pose = l1_error(pred_body_pose.reshape(-1,32).detach(), gt_body_pose.reshape(-1,32).detach()) * 100.


    error_translation_.update(error_translation, pred_trans.shape[0] * pred_trans.shape[1])
    error_orientation_.update(error_orientation, pred_trans.shape[0] * pred_trans.shape[1])
    error_pose_.update(error_pose, pred_trans.shape[0] * pred_trans.shape[1])



    summary_string = ' EVAL :[{0}/{1}]'.format(j, num_iter)
    summary_string += ' | ET {error_translation.avg:.4f} | EO {error_orientation.avg:.4f} | EP {error_pose.avg:.4f}'.format(error_translation=error_translation_, error_orientation=error_orientation_, error_pose=error_pose_)

    Bar.suffix = summary_string
    bar.next()

bar.finish()

Thanks, Seonghyun

Answer 1 · 2021-08-02T04:14:12.000Z

Hi, Seonghyun
For the visualization, I think maybe you can try this https://github.com/yz-cnsdqz/PSI-release

For the evaluation, I think this gap is quite large, perhaps you can visualize the results first to see if the implementation is correct? (visually right or wrong). I am not sure if the code of the model and the pretrained model are matched or not in your implementation since I have updated once.

Another thing is in the PROX, there are several data are completely wrong (which leads to extremely large error) and I did not use those data. Maybe you can also check this.

Answer 2 · 2021-08-02T04:24:55.000Z

Sorry I have just noticed that you are using
torch.abs(prediction - target).sum(dim=1).mean().item()

I think I am using
torch.abs(prediction-target).mean() directly

Answer 3 · 2021-08-02T07:09:11.000Z

Hi,

If torch.abs(prediction-target).mean() is used,
the results are similar to those of the paper.

For visualization, I will refer to the linked repository.

Thanks for the sincere reply.

Seonghyun

Answer 4 · 2022-05-15T16:36:16.000Z

Hi, Seonghyun For the visualization, I think maybe you can try this https://github.com/yz-cnsdqz/PSI-release

For the evaluation, I think this gap is quite large, perhaps you can visualize the results first to see if the implementation is correct? (visually right or wrong). I am not sure if the code of the model and the pretrained model are matched or not in your implementation since I have updated once.

Another thing is in the PROX, there are several data are completely wrong (which leads to extremely large error) and I did not use those data. Maybe you can also check this.

I am wondering which data you don't use. Can you provide more information?

Answer 5 · 2022-10-25T07:50:05.000Z

@Silverster98 have you found which data they use/don't use for the evaluation?

Answer 6 · 2022-10-25T09:55:56.000Z

@seonghyunkim1212 what values did you get for each error?
@jiashunwang what's your criterion for discarding PROX eval samples?

Answer 7 · 2022-10-26T05:54:41.000Z

Hi @nicolasugrinovic ,

I remembered when we observed the distribution of the error, we found some errors were extremely large. And then we found actually those pseudo-gt of PROX were wrong. So we ignored them.

Answer 8 · 2022-10-26T09:48:46.000Z

@jiashunwang thanks for replying.

Ok got it. So do you have a list stored somewhere of the exact frames you ignored? Or maybe you used a certain error threshold to ignore these frames with bad pseudo-gt?

Answer 9 · 2022-10-26T16:09:29.000Z

@nicolasugrinovic
Sorry, it has been quite a long while and I couldn't find/remember that. I remember we just ignored those quite large errors.

Answer 10 · 2022-10-28T10:01:33.000Z

@jiashunwang
No worries, thanks for the answer.

In the code above, I see that the evaluation of the poses is computed in the vPoser space, over the 32D vector. The numbers reported in the paper are computed like that? Or do you use SMPL params to get the L1 error instead?

Answer 11 · 2022-10-28T18:52:56.000Z

@nicolasugrinovic
We compare it in the 32D vposer space, since it is a countinuous representation.

Answer 12 · 2022-10-28T19:36:52.000Z

@jiashunwang Ok, thanks