Extremely poor validation results on self-trained checkpoints

Question

Extremely poor validation results on self-trained checkpoints

Jessy-Huang opened this issue a year ago · 9 comments

Dear Tony.
Thank you for your excellent work on ALOHA, I have tried to reproduce your work in the Mujoco simulation environment, and based on your open source data, The success rate should be around 90% for transfer cube, and around 50% for insertion.
I have trained and validated on your open source dataset, and I get results of around 54% for transfer cube, and around 14% for insertion.The results are very dismal, and I have not changed any of the parameter settings, and I would like to know what are some of the reasons for such a problem, or if I need to optimise in any way. The exact data can be viewed in the table below

Here are my training parameter settings

python3 imitate_episodes.py \
--task_name sim_transfer_cube_scripted \
--ckpt_dir ~/data/aloha/act/sim_transfer_cube_scripted/ckpt/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000  --lr 1e-5 \
--seed 0

Here are my eval parameter settings

python3 imitate_episodes.py \
--task_name sim_transfer_cube_scripted \
--ckpt_dir ~/data/aloha/act/sim_transfer_cube_scripted/ckpt/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000  --lr 1e-5 \
--seed 0 \
--eval

The dataset obtained from the training is shown in the following link (Lark document, you need to register yourself, please note if you apply for permission to view)
https://iklxo6z9yv.feishu.cn/sheets/LRwosV4jnh7xokt8F8UcMT7snzh?from=from_copylink

Answer 1 · 2024-01-29T19:13:28.000Z

I suspect the mujoco version is messing things up. I updated the requirements yesterday: 742c753

Could you try reinstalling these packages and evaluate the same checkpoints again? You would not need to retrain the policy.

Answer 2 · 2024-01-30T06:31:06.000Z

I suspect the mujoco version is messing things up. I updated the requirements yesterday: 742c753

Could you try reinstalling these packages and evaluate the same checkpoints again? You would not need to retrain the policy.

I have changed the version of mujuco and dm-control according to your instructions, their version information is as follows

(aloha) ➜  act-main conda list | grep dm-control
dm-control                1.0.14                   pypi_0    pypi
(aloha) ➜  act-main conda list | grep mujoco    
mujoco                    2.3.7                    pypi_0    pypi
(aloha) ➜  act-main

The parameters used for model inference are as follows

python3 imitate_episodes.py \
--task_name sim_transfer_cube_scripted \
--ckpt_dir ~/data/aloha/act/sim_transfer_cube_scripted/ckpt/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000  --lr 1e-5 \
--seed 0 \
--eval

The result of this reasoning is as follows

Success rate: 0.54
Average return: 318.9

Reward >= 0: 50/50 = 100.0%
Reward >= 1: 40/50 = 80.0%
Reward >= 2: 35/50 = 70.0%
Reward >= 3: 27/50 = 54.0%
Reward >= 4: 27/50 = 54.0%

policy_best.ckpt: success_rate=0.54 avg_return=318.9

Does it need to be retrained or are there any other possibilities that led to a bad result？

In addition, by observing all the failed videos, I found that all the failures were caused by the right robotic arm failing to catch the square when grasping it leading to the failure of the final exchange of the square, is it necessary to introduce a model of object detection during grasping to ensure that the robotic arm can accurately grasp the object? I've put some of the failed gripping videos in the link below.

video5.mp4

video7.mp4

Answer 3 · 2024-01-30T07:27:04.000Z

Dear Tony, I followed your tips and updated mujoco and dm-control, but the results did not improve, see github issue for details Best wishes, Jessy 在 2024-01-30 03:13:38，"Tony Z. Zhao" ***@***.***> 写道： I suspect the mujoco version is messing things up. I updated the requirements yesterday: 742c753 Could you try reinstalling these packages and evaluate the same checkpoints again? You would not need to retrain the policy. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Answer 4 · 2024-02-14T00:18:06.000Z

Dear Tony. Thank you for your excellent work on ALOHA, I have tried to reproduce your work in the Mujoco simulation environment, and based on your open source data, The success rate should be around 90% for transfer cube, and around 50% for insertion. I have trained and validated on your open source dataset, and I get results of around 54% for transfer cube, and around 14% for insertion.The results are very dismal, and I have not changed any of the parameter settings, and I would like to know what are some of the reasons for such a problem, or if I need to optimise in any way. The exact data can be viewed in the table below

Here are my training parameter settings
python3 imitate_episodes.py \
--task_name sim_transfer_cube_scripted \
--ckpt_dir ~/data/aloha/act/sim_transfer_cube_scripted/ckpt/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000  --lr 1e-5 \
--seed 0
Here are my eval parameter settings
python3 imitate_episodes.py \
--task_name sim_transfer_cube_scripted \
--ckpt_dir ~/data/aloha/act/sim_transfer_cube_scripted/ckpt/ \
--policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
--num_epochs 2000  --lr 1e-5 \
--seed 0 \
--eval
The dataset obtained from the training is shown in the following link (Lark document, you need to register yourself, please note if you apply for permission to view) https://iklxo6z9yv.feishu.cn/sheets/LRwosV4jnh7xokt8F8UcMT7snzh?from=from_copylink

I have the same prolem, have you solved it? Thanks.

Answer 5 · 2024-03-07T21:33:14.000Z

@Jessy-Huang @z-yf17 Have you solved the problem?

Answer 6 · 2024-03-07T22:39:12.000Z

yes

…

---- Replied Message ---- | From | Andrew ***@***.***> | | Date | 03/08/2024 05:33 | | To | ***@***.***> | | Cc | ***@***.***>***@***.***> | | Subject | Re: [tonyzhaozh/act] Extremely poor validation results on self-trained checkpoints (Issue #18) | @***@***.*** Have you solved the problem? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 7 · 2024-03-08T07:04:30.000Z

I am experiencing a similar problem where the success rates are too low. How did you solve the problem? Any advice would be appreciated. Thank you!

Answer 8 · 2024-03-08T07:45:54.000Z

you can try to give a bigger num_epoch first

…

---- Replied Message ---- | From | Andrew ***@***.***> | | Date | 03/08/2024 15:04 | | To | ***@***.***> | | Cc | ***@***.***>***@***.***> | | Subject | Re: [tonyzhaozh/act] Extremely poor validation results on self-trained checkpoints (Issue #18) | I am experiencing a similar problem where the success rates are too low. How did you solve the problem? Any advice would be appreciated. Thank you! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 9 · 2024-03-15T15:14:25.000Z

我也遇到类似的问题，成功率太低。你是如何解决这个问题的？任何意见，将不胜感激。谢谢！

You can try the following parameters, in my case the success rate is 100%.
python imitate_episodes.py --task_name sim_transfer_cube_scripted --ckpt_dir ckpt_dir_batchsize_16_epoch_4000_TG --policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 16 --dim_feedforward 3200 --num_epochs 4000 --lr 2e-5 --seed 0 --eval