AI standing still

Question

AI standing still

3DJakob opened this issue 2 years ago · 6 comments

The testmodel and testmap works great. Now I am experimenting with training my own model on my own map. I have recorded a reward function using the command and when running python -m tmrl --check-environment I can se that the reward increases when I drive faster along the path of my map so I think that is working.

However when I start the server, worker and trainer it always seem to prefer staying still and not giving any inputs. It might try driving a bit forward on the first run but then it just stays. The virtual controller is working since if I give it a push by pressing forwards it will steer.

Do you have any clue what I might be missing.

I tried renaming the SAC_4_LIDAR_pretrained and weights in case it used some old data in case I accidentally trained it to stand still. But I got the same result.

Thank you so much for your dedication on this amazing project 🙏

Answer 1 · 2022-09-23T14:09:56.000Z

Hmm might be because the new default hyperparameters are not adapted to the LIDAR environment, do you see the critic and actor losses explode?

The reward function has been divided by 100 in version 0.3.0 and I have adapted the default hyperparameters to something I thought would work with the LIDAR, but I haven't tested this.

Usually when I had this behaviour with the AI learning to stand still, it would be because of exploding losses. I thought this was because the reward function was set too high, which is the reason why I divided it by 100, and this seems to work in the full environment (which is what I am focusing on right now). When I had this issue with exploding losses in the LIDAR environment, reducing the actor learning rate (actor_lr) in config.json would usually fix it.

Answer 2 · 2022-09-23T14:22:39.000Z

(BTW this is if the AI learns to stay still and the losses explode. If the losses do not explode, it is normal that the AI does random stuff at the beginning of training, this is because its neural network is initialized randomly)

Answer 3 · 2022-09-23T14:32:13.000Z

(Also if you are using the LIDAR on your own map, be mindful of Markovness. The map has to have only black borders and be all flat like the tmrl-train map. This is because the LIDAR reduction is a pretty hardcore reduction : it needs the black borders to be computed and it becomes non-Markov in case of weirdnesses such as slopes because it cannot really "see" those. The full environment, on the other hand, should work with any map if the map has no checkpoints, but of course it is much longer and more difficult to train than the LIDAR thing)

Answer 4 · 2022-09-26T09:00:48.000Z

Thank you so much for your help. I think I did manage to mess something up at some point. Because when I cleared my model this morning and tried again the AI quickly started to progress on the map.

The map I've created is only flat and using the standard road so hopefully it should work great.

I've noticed that you can see some statistics of the actor losses when ending the training session. Is there anyway to get these numbers so I could make an excel graph at the end to summarize all my training?

Answer 5 · 2022-09-26T13:02:39.000Z

Better than that, your statistics should get automatically logged to wandb.ai

By default they are logged on the public tmrl project at https://wandb.ai/tmrl/tmrl but you can create an account on their website and put your API key / project name in config.json

Answer 6 · 2022-09-26T14:20:55.000Z

Amazing thank you!