DirtyHarryLYL/HAKE-Action-Torch

Evaluate Error

Shunli-Wang opened this issue · 11 comments

Sorry to bother you again. Thanks for releasing your wonderful work!

When I run the evaluate code. I found some questions:

  1. There is no gt_hoi_py2/hoi_%d.pkl file, then I copy all hoi_ files from HAKE_Instance project, and it works.
  2. I found that the sizes of bboxes and keys are [1,?,?] like. We need add a 0-index before indexing 1-th and 2-th axies. e.x. bbox = bboxes[0][i][select, :]
    bbox = bboxes[i][select, :]
    key = keys[i][select]
  3. In this file, key is a list and gt_bbox is a dict. I can't figure it out....
    if key in gt_bbox:

BTW, when training AE and IDN, it seems that I must set the ratio of semi_hard loss to 0.1 to prevent the expload of Loss. After seeting the ratio 0.1, the Loss curves are as follow. However, if the ratio is 1.0, Loss will expolad after 1 or 2 epoch. Why did it happen?
image

Looking forward to your replay, Thank you very much!

It seems that there are some mistakes in eval.py and get_map.py. The results can't be evaluated due a lot of errors. Looking forward to your correction!

There is no variable named map_ko_3:

f.write('total ap: %.4f rec: %.4f \n' % (float(np.mean(map_ko_3)), float(np.mean(mrec_ko_3))))

The whole project is based on COCO Detection results, and NIS results is matched to COCO Detection results. So I have two question:

  1. If I want to try a different Detector, such as DRG results, do I need to run TIN model for all these results to get the nis.pkl file? If yes, could you please provide the code of generating this file?
  2. Do we need to train TIN model and IDN jointly ?

(感觉自己英文书面表达不清楚,所以只能用中文了:
1.如果想使用不同的检测结果复现实验结果,例如DRG结果,则需要有对应的NIS.pkl文件,请问NIS文件如何生成?可否提供代码?
2.请问需不需要联合训练IDN和TIN呢?)
正在一步步看懂您们很有启发性的工作,期待您的回复!

After solving all errors in EVAL setp, I got the final result of eval_hico_coco provided in the repo. The final result is as follow. This result didn't match reported data in paper. So there must be something wrong in the original code or my modification. However, I can't figure it out. Looking forward to your new version of EVAL code!

image

Thanks for supporting!
Q1: gt_hoi_py2/hoi_%d.pkl: That's right. We will upload it asap.

Q2: The sizes: bboxes and keys are not arrays. They should be lists with length of 80, where each item is a different sized array. So maybe there is something else wrong? Would you please offer some more information?

Q3: The key is the image_id, and gt_bbox is all the bounding boxes of ground truth pairs with corresponding HOI.

Q4: Semi-hard loss. The loss tends to explode when the loss scale is too large, just like training a model with too large learning rate. For this, carefully loss weight tuning is required, and we take 0.1 as a recommended value. BTW, if the weight tuning seems cumbersome, removing semi-hard loss could be an option, which won't bring noticeable performance drop.

Q5: Thanks for point out the bug! We have fixed it.

Q6: Our reported results on DRG and VCL detections do not use NIS. But with NIS, higher performance might be achieved. The NIS.pkl contains output acquired from this line:

output['p_binary'] = p_binary

Q7: So we do not involve TIN in this work.

Q8: Thanks for pointing out! We have found the reason and updated our code: exponential function should be used to project the outputs of integration module and decomposition module as stated in our paper, instead of sigmoid function, which is an early version. If any other unexpected results happen, feel free to contact us in this issue.

非常感谢您耐心的回复!为了表达清楚请原谅我直接用中文了....

我又重新下载了您的源代码,从头开始进行EVAL的调试,陆续地发现了一些小小的问题,希望您能更正一下。如果是我理解错误了请您批评指正!

1.载入epoch_30.pth时,显示以下错误。鉴于这个loss的weight在EVAL模式下不重要,建议您在load模型时添加"strict=False"忽视这个错误。
RuntimeError: Error(s) in loading state_dict for IDN: Unexpected key(s) in state_dict: "semihard_loss.pos_weight".

self.load_state_dict(state)

2.net运行得到结果之后,这两行的data转换是多余的,不然后续的torch.sigmoid和torch.exp会报错。直接注释即可。

output['s_AE'] = output['s_AE'].detach().cpu().numpy()

output['s_rev'] = output['s_rev'].detach().cpu().numpy()

output['s_AE'] = torch.sigmoid(output['s_AE']).detach().cpu().numpy() TypeError: sigmoid(): argument 'input' (position 1) must be Tensor, not numpy.ndarray

3.当没有NIS设定时,如果不使用None初始化sel则会报错,需要添加for循环赋600个None值

for i in range(80):

4.NIS中的obj_range索引有错误,cls应当和for循环中的i保持一致,使得obj_range获取到正确的索引,这样才能对每个sel赋恰当的值。183行同理。

HAKE-Action-Torch/eval.py

Lines 171 to 172 in 3ad15e9

for i in range(80):
x, y = obj_range[cls][0] - 1, obj_range[cls][1]

5.这里的exp应当替换为exp/,否则路径会出错。

result_file = 'exp' + model + '/result.pkl'

6.训练带有IPT的IDN时,需要加载H和O的替换数据,dataset.py文档中几行代码发生了错误。obj和sub混到了一起:

else:
sub_ipt = info['pool'][sub_id][np.random.randint(0, info['pool'][sub_id].shape[0])]
with h5py.File(osp.join(self.data_dir, 'feature', self.split, str(self.db[sub_ipt[0]]) + '.h5'), 'r') as f:
sub_vec.append(f['FH'][info['H_mapping'][sub_ipt[1]], :])
obj_ipt = info['pool'][sub_id][np.random.randint(0, info['pool'][obj_id].shape[0])]
with h5py.File(osp.join(self.data_dir, 'feature', self.split, str(self.db[obj_ipt[0]]) + '.h5'), 'r') as f:
obj_vec.append(f['FH'][info['H_mapping'][obj_ipt[1]], :])

image
我尝试对代码进行了修改,为新的info添加了新的[H-mapping]索引,这样可以正确运行:
image

目前发现了这些小细节,不当之处请您批评指教。感谢您能在百忙之中查看,期待您的回复!

Thanks very much for pointing out the bugs! We will fix them ASAP.
And about the evaluation performance, it is really weird, since result on our local server seems fine.
image
We will find out the problem of this github version of code as soon as possible.
And how's the result after the NIS is corrected?

Thanks for your replay!!! It turns out that some of my modification is wrong.... I'm so sorry. After fixing all mistakes, I got the same results as reported in the paper by eval_hico_coco provdied in this repo.

Then I Re-train the whole model from AE to IDN, from IDN to IPT (with the ratio of 0.1 to semi-hard loss in AE, with the ratio 0.01 to semi-hard loss in IDN to IPT. If there is no ratio, the loss will expload fast), , the results turns to bad:
image

I don't know where the problem comes from. I will download your new version code and try again. BTW, once take account the semi-hard loss, the total loss will be unstable, I just wonder how can you train such a model.

Now I'm trying to re-train the whole model without semi-hard loss and see what will happen.

Thank you very much!

Sorry to bother you again. I think this is still not appropriate.
We want to replace a new person B to the original person A right here. Suppose that the A comes from img_id=1000 and B comes from img_id=9999, we then useh5py module to load features of img_id=9999, but info['H_mapping'] still only contains all information of A , not B.

I think maybe we need to create a new info_9999 to save information from img_id=9999, and we also need to create a new H-mapping_9999 inorder to get the right feature of person B in img_id=9999.

with h5py.File(osp.join(self.data_dir, 'feature', self.split, str(self.db[sub_ipt[0]]) + '.h5'), 'r') as f:
sub_vec.append(f['FH'][info['H_mapping'][sub_ipt[1]], :])
obj_ipt = info['pool'][obj_id][np.random.randint(0, info['pool'][obj_id].shape[0])]

I don't know if my understanding is correct, please correct me. Thanks a lot.

Yes it's correct now. Thanks a lot for pointing out!

Thank you very much for your patience.

Sorry to bother you again. Thanks for releasing your wonderful work!

When I run the evaluate code. I found some questions:

  1. There is no gt_hoi_py2/hoi_%d.pkl file, then I copy all hoi_ files from HAKE_Instance project, and it works.
  2. I found that the sizes of bboxes and keys are [1,?,?] like. We need add a 0-index before indexing 1-th and 2-th axies. e.x. bbox = bboxes[0][i][select, :]
    bbox = bboxes[i][select, :]
    key = keys[i][select]
  3. In this file, key is a list and gt_bbox is a dict. I can't figure it out....
    if key in gt_bbox:

BTW, when training AE and IDN, it seems that I must set the ratio of semi_hard loss to 0.1 to prevent the expload of Loss. After seeting the ratio 0.1, the Loss curves are as follow. However, if the ratio is 1.0, Loss will expolad after 1 or 2 epoch. Why did it happen?
image

Looking forward to your replay, Thank you very much!

Hi, 请问一下你训练的时候速度快吗?我发现我用这份代码无论训练还是测试,GPU使用率经常为0。你训练/测试一次需要多久呢