Unable to reproduce the performance of the pretrained checkpoint for Calvin-Sim
Ping-C opened this issue · 11 comments
Hello Kevin,
I am back again. Thank you for looking at this issue!
So I attempted to reproduce the performance of the pretrained goal conditioned policy, but was unable to reproduce the performance of your pretrained checkpoint in Calvin-sim, and was wondering whether you could potential shed some lights on what I may be missing. The answer can be short, and doesn't have to be complete. Some simple pointers would likely suffice.
First, I downloaded the calvin-sim data, and preprocess it with experiments/susie/calvin/data_conversion_scripts/goal_conditioned.py
on the abc training + d validation dataset. To get it working, I had to modify the raw_dataset_path
, tfrecord_dataset_path
, and then comment out the following section of code
if start_idx <= scene_info["calvin_scene_D"][1]:
ctr = D_ctr
D_ctr += 1
letter = "D"
Second, I then trained the goal conditioned policy on calvin-sim using the following script
python experiments/susie/calvin/calvin_gcbc.py \
--config experiments/susie/calvin/configs/gcbc_train_config.py:gc_ddpm_bc \
--calvin_dataset_config experiments/susie/calvin/configs/gcbc_data_config.py:all
after updating data_path
, save_dir
in bridge_data_v2/experiments/susie/calvin/configs/gcbc_train_config.py
.
I trained the model for 2 million steps as specified in the config, and the loss level went from ~2.5 to roughly 0.65 at the end of the training (see the plot below). Note that I did have to resume the checkpoints multiple times throughout. I then ran evaluations on multiple checkpoints throughout training coupled with the pretrained diffusion model, and these are roughly the success rate that I got for the each no. of instruction chained.
1: 57.0%
2: 21.0%
3: 7.0%
4: 2.0%
5: 1.0%
which is much worse than your pretrained gc policy + your pretrained diffusion model
1: 81.0%
2: 65.0%
3: 46.0%
4: 30.0%
5: 21.0%
If you could potentially give me some pointers on what I may be doing incorrectly, it would be greatly appreciated! :)
And qualitatively, the model also seems to move more erratically compared to the pretrained model.
combined_test.mov
Hi Ping-C,
Can you un-comment line 198 of calvin-sim/calvin_models/calvin_agent/evaluation/diffusion_gc_policy.py
and re-evaluate? I suspect the issue is that the policy was trained with normalized actions, but is being evaluated without the assumption that actions are normalized.
It worked! Kevin, you are amazing!
Hello Ping-C,
I download the diffusion model and goal conditioned policy checkpoints from https://huggingface.co/patreya/susie-calvin-checkpoints and set the values of the environment variables in eval_susie.sh,
but the result is not good :
Average successful sequence length: 0.4666666666666667
Success rates for i instructions in a row:
1: 33.3%
2: 13.3%
3: 0.0%
4: 0.0%
5: 0.0%
turn_on_led: 2 / 2 | SR: 100.0%
open_drawer: 4 / 4 | SR: 100.0%
turn_on_lightbulb: 1 / 1 | SR: 100.0%
push_blue_block_right: 0 / 1 | SR: 0.0%
rotate_blue_block_right: 0 / 1 | SR: 0.0%
lift_blue_block_slider: 0 / 1 | SR: 0.0%
lift_blue_block_table: 0 / 1 | SR: 0.0%
push_pink_block_left: 0 / 2 | SR: 0.0%
move_slider_left: 0 / 3 | SR: 0.0%
push_blue_block_left: 0 / 2 | SR: 0.0%
lift_red_block_slider: 0 / 1 | SR: 0.0%
push_red_block_left: 0 / 1 | SR: 0.0%
rotate_red_block_left: 0 / 1 | SR: 0.0%
lift_red_block_table: 0 / 1 | SR: 0.0%
I noticed that you have a high success rate in evaluating pre-trained models, so I wanted to ask if there are any other operations you perform during the evaluation besides downloading the model and modifying the paths?
The actions of the robotic arm seem strange in some tasks, and I suspect that it may be an issue with GCBC.The robotic arm has even moved outside the field of view.
I would greatly appreciate it if you could tell me how to properly evaluate pre-trained models. :)
Hi @Ping-C and @pranavatreya ,
Can you please help me? I did the same. I am trying to train the model on CALVIN dataset ABC. when I run:
python experiments/susie/calvin/calvin_gcbc.py --config experiments/susie/calvin/configs/gcbc_train_config.py:gc_ddpm_bc --calvin_dataset_config experiments/susie/calvin/configs/gcbc_data_config.py:all
I got that error: Traceback (most recent call last): File "experiments/susie/calvin/calvin_gcbc.py", line 186, in app.run(main) File "/home/gaurav/miniconda3/envs/susie-calvin/lib/python3.8/site-packages/absl/app.py", line 308, in run run_main(main, args) File "/home/gaurav/miniconda3/envs/susie-calvin/lib/python3.8/site-packages/absl/app.py", line 254, in run_main sys.exit(main(argv)) File "experiments/susie/calvin/calvin_gcbc.py", line 77, in main task_paths = [ File "experiments/susie/calvin/calvin_gcbc.py", line 78, in glob_to_path_list( File "/media/local/gaurav/Music/calvin-sim/bridge_data_v2/jaxrl_m/data/calvin_dataset.py", line 27, in glob_to_path_list assert len(filtered_paths) > 0, f"{glob_str} came up empty" AssertionError: training/A/?/? came up empty
I ensured that the dataset is located at the path expected by the script. The glob pattern training/A/?/? suggests it's looking for directories or files within training/A/ where each subdirectory in A has a single-character name. so what should I do?
I will appreciate your help. Thanks in advance!
Have you converted the dataset into tfrecord format?
@houyaokun Hi, have you solved the problem? I am struggling with the same issue. 😭
@houyaokun Hi, have you solved the problem? I am struggling with the same issue. 😭
yeah,os.environ.pop("DISPLAY")
may work.
@houyaokun Thank you for your quick reply. 😄 Actually, I am having problem with reproduction. My results are so bad with the provided pre-trained models. How did you handle this problem? Thank you so much for your reply. I am struggling with this about a week.
@Ping-C Hi, I am struggling with the performance. I did all things same as they told me to do. But the performance is bad on provided pre-trained models. How did you reproduce? Is there anything else I need to do besides the default settings?
@houyaokun Thank you for your quick reply. 😄 Actually, I am having problem with reproduction. My results are so bad with the provided pre-trained models. How did you handle this problem? Thank you so much for your reply. I am struggling with this about a week.
For me,I simply addedos.environ.pop("DISPLAY")
.By doing this, you will be able to use EGL normally. Otherwise, you will have a domain gap between the images in the dataset (rendered with EGL)
So the problem is domain gap with dataset. I will take a closer look at the rendering part.
Thank you so much for your reply. Hope you have a wonderful day. 😃