decisionforce/CoPO

Visualizing when training

jhih-ching-yeh opened this issue · 7 comments

Thanks for your code and contribution! It's so great!
Then, I have some questions about visualization and local mode.

First, is it possible to visualize the training process?
Although it might costs lots of memory and efficiency, I prefer to check my scenes as training the model.

Secondly, I'm not sure I get your meaning about local mode. Does that mean I can solve the first question? Would you mind explaining what you mean more detail?

Sorry to bother you.
Thanks a lot!!

First, is it possible to visualize the training process?
Although it might costs lots of memory and efficiency, I prefer to check my scenes as training the model.

Of course, only need to set config "use_render": MetaDriveEnv({"use_render": True})

This can be done by wrapping the environment class with a customized env maker.

You can also wrap the environment by adding a call to env.render within env.step (forgive me if I use wrong names):

class Wrapper(MetaDriveRoundaboutEnv):
    def step(action):
        ret = super().step(action)
        super().render("topdown")
        return ret

Secondly, I'm not sure I get your meaning about local mode. Does that mean I can solve the first question? Would you mind explaining what you mean more detail?

If not in local mode, RLLib will use create multiple processes and run one MetaDrive env at each process. Therefore, if you set a breakpoint in the code within the MetaDrive environment, you won't see your program stop at that point since the MetaDrive code is actually running in other process. This also apply to the code within RLLib Trainer class, since the training code is also run in a separate process so that it is very hard for you to debug since you can't easily set breakpoint.

In contrast, in local mode and set number of parallel environments to 1, the training code and the MetaDrive code are running within the same process, that is the process you call training script. Therefore, in local mode, you can set a breakpoint in the environment code or the Trainer code and your program will stop there.

Local mode has nothing to do with rendering. It only provides help if you want to debug your training code or environment code.


Overall, my suggestion is that you don't have to setup online rendering because it cost so many resource.

My suggestion would be you should frequently save your model's checkpoint in local disk. You can run another program that periodically check the folder where you store the checkpoints. If there is a new checkpoint shows up, the program will automatically load the weights stored in the checkpoint and visualize the behavior of agents.

This is the best practice I think since the visualization process is completely isolated from the training process(es) and thus you can stop the vis process at any time.

Your reply is very helpful to me, thank you very much!

By the way, I still have other questions about metadrive and this project.
According to this paper, I found it compares different number of initial agents, and I read the document of Metadrive, so I consider that I have to set num_agents in the eval.py, is this idea correct? Or you also set up in training code? Because I couldn't find related code in both of files, which made me really confused. Really need your help.

Moreover, if I want to adjst vehicle_config, such as enable_reverse or spawn_lane_index in the environment, can I just follow the way to adjust env.config in train_all_copo_dist.py (following picture)?
https://imgur.com/2SKut98

Sorry to bother you again.
Your project is so great and impressive!! Thanks for your help!!!

By the way, I still have other questions about metadrive and this project.
According to this paper, I found it compares different number of initial agents, and I read the document of Metadrive, so I consider that I have to set num_agents in the eval.py, is this idea correct? Or you also set up in training code? Because I couldn't find related code in both of files, which made me really confused. Really need your help.

Correct. The "initial agents" means the number of agents at the first step of each episode. I name it because some agents terminate during the episode so the total number of agents are changing.

Moreover, if I want to adjst vehicle_config, such as enable_reverse or spawn_lane_index in the environment, can I just follow the way to adjust env.config in train_all_copo_dist.py (following picture)?
https://imgur.com/2SKut98

Correct. Check base_env.py to see which level of the config enable_reverse is in. Remember to make sure using nested dict like:

xxxEnv({"vehicle_config": {"enable_reverse": True}})

Thanks!

I'm so glad that you replied to me so quickly!
In detail, for the former question, I want to adjust the number of "initail agents", but I can't find the code you set up num_agent in this project.

  1. train
    Did you set up num_agent when you were training the model?
    Take intersection as an example, If doing so, where did you set up the code?
    Or you just use the default number in marl_intersection.py for training?
    Then, if I want to adjust the number of initial agent, is the only thing I have to do is to change line 19 in this file?
    marl_intersection.py

  2. evaluate
    I found that when executing eval.py, the program calls the library in the recoder.py
    I want to ask did you set up initial agent in recoder.py (the following picture) or marl_intersection.py(the above picture)?
    recoder.py

I appreciate your help very much.
Thank you for your help!!

Did you set up num_agent when you were training the model?

Yes! Changing "xxxEnv({"num_agents": INITIALAGENTS})" is enough! That's how we conduct the "generalization" experiments where we test the population with different number of agents in the scene.

I found that when executing eval.py, the program calls the library in the recoder.py
I want to ask did you set up initial agent in recoder.py (the following picture) or marl_intersection.py(the above picture)?

Whoever set the xxEnv(config) and config = dict(num_agents=INITIALAGENTS) sets the number of agents. If the key num_agents is not shown up in the user custom config, then we will use the default value of num_agents defined in the marl_intersection.py.

I feel deeply sorry that I ask you so many questions.

But I had the main question is that where should I set this "xxxEnv({"num_agents": INITIALAGENTS})" or MetaDriveEnv({"use_render": True})? which file?
I read lots of materials online and tried more than ten kinds of way to set up. But, I think all of that are not correct.
Actually, I'm not sure that if I want to set use_render for all algorithm, should I set up in each algorithm file, such as train_all_copo_dist.py or just set up in train.py? I'm so sorry for my stupid... Hope you can give me hints in detail or examples. Thanks a lot.

Possible method I tried(?)

  1. The only way that have showed something out (line 37 in train_all_copo_dist.py)
    However, when I only executed the copo program, there were more many ten vender vedio, and vehicles didn't move. So, I think it might be something wrong.

1 method

  1. This method is close to material online, but it will cause get_ccenv wrong. (line 20 in train_all_copo_dist.py)

2 method

error

Sorry and thanks a lot!

But I had the main question is that where should I set this "xxxEnv({"num_agents": INITIALAGENTS})" or MetaDriveEnv({"use_render": True})? which file?

We use a dict as the way to control all configs of a MetaDrive environment. We set a dict via:

MultiAgentIntersectionEnv({
    "start_seed": 1,
    "use_render": True
})

You can see some example here:

https://github.com/metadriverse/metadrive/blob/main/metadrive/examples/drive_in_multi_agent_env.py#L49

https://github.com/metadriverse/metadrive/blob/main/metadrive/examples/drive_in_single_agent_env.py#L34


I read lots of materials online and tried more than ten kinds of way to set up. But, I think all of that are not correct.
Actually, I'm not sure that if I want to set use_render for all algorithm, should I set up in each algorithm file, such as train_all_copo_dist.py or just set up in train.py? I'm so sorry for my stupid... Hope you can give me hints in detail or examples. Thanks a lot.

I think it would be a good idea to just set the config in each specific training script, instead of the train.py which provides a general training framework.


The only way that have showed something out (line 37 in train_all_copo_dist.py)
However, when I only executed the copo program, there were more many ten vender vedio, and vehicles didn't move. So, I think it might be something wrong.

I think this is correct.

I can't quite understand what there were more many ten vender vedio means. Could you provide a detailed script so that I can reproduce in my side?


This method is close to material online, but it will cause get_ccenv wrong. (line 20 in train_all_copo_dist.py)

In RLLib training config,

env=ENVIRONMENT_CLASS,
env_config=CONFIG_DICT

and it will automatically run env = ENVIRONMENT_CLASS(CONFIG_DICT). Therefore, you can't create a environment instance directly in the training config, but instead, you need to provide a environment class and let RLLib to create the environment instances it needs.