[Test] The test result is not consistent with that reported in paper.

Question

[Test] The test result is not consistent with that reported in paper.

Closed this issue a year ago · 2 comments

Hi, I tried to replement the test period via your given command python3 scripts/example.py --ckpt={ckpt_path} --device={device} --partition={eval_level} --task={task}. For more detailes, in my experiment, I used the 200M.ckpt.

Specifically,

I excute the command mentioned before, by using 100 instances per task as the test sample.
Then the success ratio is obtained by obs, _, done, info = env.step(...).
I got the success ratio by averaging the result according to the L1-L4.

However, I found that the result I obtained is far from that in your paper. The following table is my experimental result, and its sucess ratio is too lower than your result.
By the way, the result of L1 and L2 is too similary. Is there any bug in my test period?

	L1		L2		L3		L4
	succ	fail	succ	fail	succ	fail	succ	fail
Simple Object Manipulation:visual_manipulation	99	1	94	6	100	0
Simple Object Manipulation:scene_understanding	100	0	98	2	96	4
Simple Object Manipulation:rotate	100	0	100	0	100	0
Visual Goal Reaching:rearrange	49	51	49	51	49	51
Visual Goal:reaching:rearrange_then_restore	10	90	12	88	11	89
Novel Concept Grounding:novel_adj	99	1	100	0	99	1
Visual Reasoning:noval_noun	97	3	97	3	99	1
Novel Concept Grounding:novel_adj_and_noun							98	2
Novel Concept Grounding:twist	1	99	4	96	0	100
One-shot Video Imitation:follow_motion							0	100
One-shot Video Imitation:follow_order	44	56	45	55	47	53
Visual Constraint Satisfaction:sweep_without_exceeding	67	33	67	33
Visual Constraint Satisfaction:sweep_without_touching							0	100
Visual Reasoning:same_texture							50	50
Visual Reasoning:same_shape	50	50	50	50	50	50
Visual Reasoning:manipulate_old_neighbor	47	53	47	53	37	63
Visual Reasoning:pick_in_order_then_restore	11	89	10	90	13	87
num	774	526	773	527	701	499	148	252
success ratio	59.54		59.46		58.4		0.37

the empty denotes the example.py does not support

Answer 1 · 2023-09-15T09:18:43.000Z

At the same time, I don't find the usage of mask R-CNN. The bbox is not recognized by any models, but given by the env (if I dont miss anything). Could you provide more details about this?
In addition, the dimension of action pos0_position in the given training data is different from that in the test enviroment. The former is 3 while the latter is 2. It makes me curious that how can I convert the training action space to the test action space.

Answer 2 · 2023-09-26T05:36:12.000Z

Closed as duplicate here.