The detail of traning and evaluating process can be shared? How can I get a success rate?

Question

The detail of traning and evaluating process can be shared? How can I get a success rate?

shure-dev opened this issue a year ago · 2 comments

First of all, thank you for sharing, congratulations on your remarkable work.
I would like to know further details about your research. I have two questions.

Your paper used "success rate" to evaluate performance, right? What exactly is this? How did you define what are success and failure? Would you mind if you share your code related to this point? Is this implemented in the VimaBench package?
In your paper, you mentioned this point for each task as follows, eg, for follow motion task,

Success Criteria: In each step, the pose of the target object matches the pose in the corresponding video frame. Incorrect manipulation sequences are considered failures.

I want to know how accurate should an object be in the correct place. As my first step, I want to reproduce your experiment and confirm the result which is the same as mentioned in your paper in a quantitative way, so I need to create the same situation. How can I get a success rate?

I'm now trying to implement a training process from scratch, however, I guess it will be tough without instruction. Is it possible to implement that process only with information on your paper? As you already said in other GitHub issues, I know you cannot share the detail of the training process.

Thank you in advance!

Answer 1 · 2023-06-30T16:54:08.000Z

Hi there, thanks for your interest in the project. To answer your questions:

For all tasks, successes and failures are automatically checked and can be queried from the info dict returned by env.step(). See here for more details. Success rates are estimated by rolling out multiple episodes and calculating the average.
Yes, it's possible to implement the training since data and model code are public and training hyperparameters are included in the paper appendix. But please note that certain hyperparameters may need to be tuned subject to different implementations.

Let me know if you have further questions.

Answer 2 · 2023-09-16T14:23:30.000Z

Hi, have you reimplemented the evaluation result? I tried the 20M.ckpt and 200M.ckpt and the result is far from that in paper. If you have some suggestions, I would very grateful. Details can be seen in Issue #34