vimalabs/VIMA

[Test] The test result is not consistent with that reported in paper.

aopolin-lv opened this issue · 11 comments

Hi, I tried to replement the test period via your given command python3 scripts/example.py --ckpt={ckpt_path} --device={device} --partition={eval_level} --task={task}. For more detailes, in my experiment, I used the 200M.ckpt.

Specifically,

  1. I excute the command mentioned before, by using 100 instances per task as the test sample.
  2. Then the success ratio is obtained by obs, _, done, info = env.step(...).
  3. I got the success ratio by averaging the result according to the L1-L4.

However, I found that the result I obtained is far from that in your paper. The following table is my experimental result, and its sucess ratio is too lower than your result.
By the way, the result of L1 and L2 is too similary. Is there any bug in my test period?

  L1 L2 L3 L4
  succ fail succ fail succ fail succ fail
Simple Object Manipulation:visual_manipulation 99 1 94 6 100 0    
Simple Object Manipulation:scene_understanding 100 0 98 2 96 4    
Simple Object Manipulation:rotate 100 0 100 0 100 0    
Visual Goal Reaching:rearrange 49 51 49 51 49 51    
Visual Goal:reaching:rearrange_then_restore 10 90 12 88 11 89    
Novel Concept Grounding:novel_adj 99 1 100 0 99 1    
Visual Reasoning:noval_noun 97 3 97 3 99 1    
Novel Concept Grounding:novel_adj_and_noun             98 2
Novel Concept Grounding:twist 1 99 4 96 0 100    
One-shot Video Imitation:follow_motion             0 100
One-shot Video Imitation:follow_order 44 56 45 55 47 53    
Visual Constraint Satisfaction:sweep_without_exceeding 67 33 67 33        
Visual Constraint Satisfaction:sweep_without_touching             0 100
Visual Reasoning:same_texture             50 50
Visual Reasoning:same_shape 50 50 50 50 50 50    
Visual Reasoning:manipulate_old_neighbor 47 53 47 53 37 63    
Visual Reasoning:pick_in_order_then_restore 11 89 10 90 13 87    
num 774 526 773 527 701 499 148 252
success ratio 59.54   59.46   58.4   0.37  
  • the empty denotes the example.py does not support

At the same time, I don't find the usage of mask R-CNN. The bbox is not recognized by any models, but given by the env (if I dont miss anything). Could you provide more details about this?

Hi there, thank you for trying out. Could you provide more details (e.g., code snippet) so I can take a look?

@yunfanjiang Maybe I've missed it but where is the MaskRCNN model used during online evaluation?

Hi there, thank you for trying out. Could you provide more details (e.g., code snippet) so I can take a look?

I solved it just now. And I have the same question as that of amitkparekh.

Hi there, thank you for trying out. Could you provide more details (e.g., code snippet) so I can take a look?

I solved it just now. And I have the same question as that of amitkparekh.

@aopolin-lv Can you share some lessons with us, e.g. what needs to watch out and deal with carefully?

Hi there, thank you for trying out. Could you provide more details (e.g., code snippet) so I can take a look?

@aopolin-lv

When you ran it, did you literally just run a for loop in bash for each task and partition and dump the metrics to files? The big question here is how much of the code have you changed, if at all.

Hi there, thank you for trying out. Could you provide more details (e.g., code snippet) so I can take a look?

@aopolin-lv

When you ran it, did you literally just run a for loop in bash for each task and partition and dump the metrics to files? The big question here is how much of the code have you changed, if at all.

Yes, I just excute 100 instances of each task by a for loop, almost not changing the original code.

Hi there, thank you for trying out. Could you provide more details (e.g., code snippet) so I can take a look?

Hi yunfan, I replemented the training of vima, by using the vima baselines. However, I fould it is difficult for models to fit the pose1_rotation attribute. Did you met this problem and could you give me any suggestions?

Hi @aopolin-lv, thanks for the followup. We directly read off segm masks from sim in this script for demo purpose.

During training, we masked rotation loss contributed from tasks other than Rotation and Twist. Since object orientation only matters in these two tasks, optimizing the rotation action head would be dominated by other tasks.

I'll close this issue for now. Feel free to let me know if you have further questions.

I'll close this issue for now. Feel free to let me know if you have further questions.

@yunfanjiang I don't think this issue should be closed as completed as it has not been solved?

Hi @aopolin-lv, thanks for the followup. We directly read off segm masks from sim in this script for demo purpose.

During training, we masked rotation loss contributed from tasks other than Rotation and Twist. Since object orientation only matters in these two tasks, optimizing the rotation action head would be dominated by other tasks.

I'll close this issue for now. Feel free to let me know if you have further questions.

Thank you. With your advice, I have trained the model successfully. However, its performance is too poor in multi step tasks, such as rearrange and then restore/pick in then restore/follow_order / manipulate_old_neighbor and rearrange. Their performance compared with the ogirinal result reported in the paper is 20%-60%+, while other tasks are normal. How can I do with it?

@aopolin-lv Hi! can you share the training code with me? i want to reproduce these baselines (ViMaGPT,ViMaFLamingo). thk a lot