chengxuxin/extreme-parkour

Some questions about distillation

Opened this issue · 8 comments

Hello, xuxin, thank you for your open-source work, it has been a great help to me. However, during step 2, which is the distillation process, I set up 4096 environments and trained for 5000 rounds. After completing the training, I tested it and found the results were not good. I carefully analyzed the source code and noticed that the depth_encoder seems not to have been trained because lines 303 and 311 in on_policy_runner.py were commented out. May I ask why this is the case? Should I uncomment the lines above mentioned?
Thanks :)

i have the same question,in on_policy_runner.py "self.alg.update_depth_encoder" was commented out

Hi, we use action supervision instead of latent on depth distillation. Please check [this line].(

depth_actor_loss, yaw_loss = self.alg.update_depth_actor(actions_student_buffer, actions_teacher_buffer, yaw_buffer_student, yaw_buffer_teacher)
)

Hi, we use action supervision instead of latent on depth distillation. Please check [this line].(

depth_actor_loss, yaw_loss = self.alg.update_depth_actor(actions_student_buffer, actions_teacher_buffer, yaw_buffer_student, yaw_buffer_teacher)

)

I used the loss function you referred to in line 309 for the training process. As detailed in the issue raised earlier, "I set up 4096 environments and trained for 5000 rounds. After completing the training, I tested it and found the results were not good. " The robot generated from this training was incapable of fulfilling its designated task, invariably losing balance and toppling over after a mere few steps. Subsequently, I escalated the training duration to 15,000 iterations, hoping for an improvement. Regrettably, this adjustment did not yield any positive change in performance. Could you possibly provide some insight or guidance on how to rectify this issue?

I am not sure what you have changed, so it is hard to tell why it did not work as expected. Is your base policy not performing well as well? Please try to follow the same command in the readme with the original repo.

I am not sure what you have changed, so it is hard to tell why it did not work as expected. Is your base policy not performing well as well? Please try to follow the same command in the readme with the original repo.

The performance of my base policy is well. Acting upon your recommendation, I utilized the original repo and the same command to train the policy during the distillation phase. However, I found that the resultant performance was still somewhat subpar.

output.compress-video-online.com.mp4

I cannot see your video. But to debug you can try without direction distillation first.

Hello, have you built your simulation environment successfully? Can you take a look at the graphical interface after your simulation! thank you.

I cannot see your video. But to debug you can try without direction distillation first.

Hello, is your display graphical interface in VScode? My computer doesn't have a GPU, can I run it directly on the CPU? If you run directly on the CPU, can you visualize, like in your video! thank you.