How to interpret the results generated by the "inference on custom images" code?
wlcosta opened this issue · 4 comments
Hey everyone,
I followed the steps from the README file to run the inferencing on custom images.
When the inferencing is over, I read the pickle file by running the following:
import pickle
file = open('custom_imgs.pickle', 'rb')
labels = pickle.load(file)
Then I have access to the dictionary that is generated in
RLIP/inference_on_custom_imgs_hico.py
Lines 338 to 342 in f369a8a
The keys in this dictionary are the images. However, after I access the label of each specific image, it is unclear to me how to "translate" this back to a readable format (as shown in the paper) since each key from this dictionary contains only tensors with the scores for the labels and verbs.
>>> labels['custom_imgs/0000005.png'].keys()
dict_keys(['labels', 'boxes', 'verb_scores', 'sub_ids', 'obj_ids'])
>>> labels['custom_imgs/0000005.png']['labels']
tensor([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 73, 24, 56, 24, 24, 66, 66, 73,
56, 24, 66, 73, 73, 24, 66, 24, 73, 24, 73, 24, 56, 66, 24, 66, 24, 56,
24, 73, 24, 73, 73, 73, 66, 24, 24, 66, 24, 41, 66, 24, 24, 56, 24, 24,
41, 73, 24, 66, 56, 66, 24, 24, 66, 27, 73, 73, 73, 24, 24, 56, 73, 24,
24, 73])
>>> labels['custom_imgs/0000005.png']['verb_scores']
tensor([[2.4966e-05, 1.3955e-05, 1.6503e-05, ..., 9.9974e-05, 8.8934e-05,
1.8285e-05],
[3.4093e-05, 1.4082e-05, 1.0232e-05, ..., 2.6095e-04, 4.8016e-05,
1.2272e-05],
[3.8148e-06, 1.7999e-06, 1.8853e-06, ..., 3.6490e-05, 5.5943e-06,
2.3014e-06],
...,
[1.3244e-05, 5.0881e-06, 4.0611e-06, ..., 1.0128e-04, 1.8682e-05,
4.9957e-06],
[3.6079e-05, 1.5216e-05, 1.1946e-05, ..., 3.2203e-04, 5.4361e-05,
1.4641e-05],
[2.5316e-05, 1.1472e-05, 1.5237e-05, ..., 1.2542e-04, 1.2565e-04,
1.6679e-05]])
Any help is appreciated. Thanks!
@wlcosta
In terms of the output of the saved detection results, they are post-processed by results = postprocessors['hoi'](outputs, orig_target_sizes)
. Thus, you can refer to class PostProcessHOI(nn.Module)
in Line 3701 of models/hoi.py. Five values are stored in the dictionary: labels (subject and object labels), boxes (subject and object boxes), sub_ids, obj_ids and verb_scores. The verb_scores are generated by multiplying original verb_scores by object scores using vs = vs * os.unsqueeze(1)
.
Since I am using HICO fine-tuned RLIP-ParSe, the labels corresponds to the order of the output of load_hico_verb_txt()
and load_hico_object_txt()
. I hope the above information helps.
@wlcosta
Btw, I update the link of pre-trained model in Inference on Custom Images. This updated one is fine-tuned on HICO.
Thanks for the info, @JacobYuan7! I will take a look at the code you mentioned and try to extract the readable labels.
Also, thanks for the heads-up regarding the new model!
@wlcosta Pleasure. I am closing this issue as completed. Feel free to re-open it.