:children_crossing: How to save and load policy network for testing.

After training the agent, many people are not sure how to save and load the policy network after training and see how the agent actually performs in a simulation environment.
很多人在完成agent 的训练之后，不清楚要如何保存并加载训练完成后的 policy network，并在仿真环境中看看这个agent的实际表现。

Here is the code to (take Pendulum env for example):

train the agent and save the policy network
load the policy network and use it to map the state to get the action.

下面是两个代码例子（举Pendulum 环境为例）:

训练agent并保存policy network
加载policy network 并使用它对 state 映射得到 action

train the agent and save the policy network

训练agent并保存policy network

ElegantRL/examples/demo_A2C_PPO.py

Lines 14 to 18 in 68bf0ea

    
           def train_ppo_a2c_for_pendulum(): 
        
               from elegantrl.envs.CustomGymEnv import PendulumEnv 
        
               agent_class = [AgentPPO, AgentA2C][DRL_ID]  # DRL algorithm name 
        
               env_class = PendulumEnv  # run a custom env: PendulumEnv, which based on OpenAI pendulum

ElegantRL/elegantrl/train/run.py

Line 99 in 68bf0ea

agent.save_or_load_agent(cwd, if_save=True)

The process will keep saving policy network (actor) in cwd="./Pendulum_PPO_0/act.pt" (current working directory) during training.
程序会在训练中，持续保存 saving policy network (actor) 在当前的工作目录下 cwd="./Pendulum_PPO_0/act.pt" (current working directory)

ElegantRL/elegantrl/train/run.py

Line 92 in 68bf0ea

    
           evaluator.evaluate_and_save(actor=agent.act, steps=horizon_len, exp_r=exp_r, logging_tuple=logging_tuple)

load the policy network and use it to map the state to get the action.

加载policy network 并使用它对 state 映射得到 action

ElegantRL/examples/demo_A2C_PPO.py

Line 662 in 68bf0ea

def demo_load_pendulum_and_render():

The following code load the policy netowrk (actor) from disk:
下面的代码从硬盘里加载了 policy netowrk (actor)：

ElegantRL/examples/demo_A2C_PPO.py

Lines 679 to 682 in 68bf0ea

    
           '''init''' 
        
           from elegantrl.train.config import build_env 
        
           env = build_env(env_class=env_class, env_args=env_args) 
        
           act = torch.load(f"./Pendulum_PPO_0/act.pt", map_location=device)

The following code map state to action using policy netowrk (actor):
下面的代码使用 policy netowrk (actor) 将 state 映射到 action：

ElegantRL/examples/demo_A2C_PPO.py

Lines 699 to 705 in 68bf0ea

    
           for steps in range(12345): 
        
               s_tensor = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0) 
        
               a_tensor = act(s_tensor).argmax(dim=1) if if_discrete else act(s_tensor) 
        
               action = a_tensor.detach().cpu().numpy()[0]  # not need detach(), because using torch.no_grad() outside 
        
               state, reward, done, _ = env.step(action) 
        
               returns += reward 
        
               env.render()

	def train_ppo_a2c_for_pendulum():
	from elegantrl.envs.CustomGymEnv import PendulumEnv

	agent_class = [AgentPPO, AgentA2C][DRL_ID] # DRL algorithm name
	env_class = PendulumEnv # run a custom env: PendulumEnv, which based on OpenAI pendulum

	'''init'''
	from elegantrl.train.config import build_env
	env = build_env(env_class=env_class, env_args=env_args)
	act = torch.load(f"./Pendulum_PPO_0/act.pt", map_location=device)

	for steps in range(12345):
	s_tensor = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
	a_tensor = act(s_tensor).argmax(dim=1) if if_discrete else act(s_tensor)
	action = a_tensor.detach().cpu().numpy()[0] # not need detach(), because using torch.no_grad() outside
	state, reward, done, _ = env.step(action)
	returns += reward
	env.render()