AI4Finance-Foundation/ElegantRL

:children_crossing: How to save and load policy network for testing.

Opened this issue · 0 comments

After training the agent, many people are not sure how to save and load the policy network after training and see how the agent actually performs in a simulation environment.
很多人在完成agent 的训练之后,不清楚要如何保存并加载训练完成后的 policy network,并在仿真环境中看看这个agent的实际表现。

Here is the code to (take Pendulum env for example):

  • train the agent and save the policy network
  • load the policy network and use it to map the state to get the action.

下面是两个代码例子(举Pendulum 环境为例):

  • 训练agent并保存policy network
  • 加载policy network 并使用它 对 state 映射得到 action

train the agent and save the policy network

训练agent并保存policy network

def train_ppo_a2c_for_pendulum():
from elegantrl.envs.CustomGymEnv import PendulumEnv
agent_class = [AgentPPO, AgentA2C][DRL_ID] # DRL algorithm name
env_class = PendulumEnv # run a custom env: PendulumEnv, which based on OpenAI pendulum

agent.save_or_load_agent(cwd, if_save=True)

The process will keep saving policy network (actor) in cwd="./Pendulum_PPO_0/act.pt" (current working directory) during training.
程序会在训练中,持续保存 saving policy network (actor) 在当前的工作目录下 cwd="./Pendulum_PPO_0/act.pt" (current working directory)

evaluator.evaluate_and_save(actor=agent.act, steps=horizon_len, exp_r=exp_r, logging_tuple=logging_tuple)

load the policy network and use it to map the state to get the action.

加载policy network 并使用它 对 state 映射得到 action

def demo_load_pendulum_and_render():

The following code load the policy netowrk (actor) from disk:
下面的代码从硬盘里 加载了 policy netowrk (actor):

'''init'''
from elegantrl.train.config import build_env
env = build_env(env_class=env_class, env_args=env_args)
act = torch.load(f"./Pendulum_PPO_0/act.pt", map_location=device)

The following code map state to action using policy netowrk (actor):
下面的代码使用 policy netowrk (actor) 将 state 映射到 action:

for steps in range(12345):
s_tensor = torch.as_tensor(state, dtype=torch.float32, device=device).unsqueeze(0)
a_tensor = act(s_tensor).argmax(dim=1) if if_discrete else act(s_tensor)
action = a_tensor.detach().cpu().numpy()[0] # not need detach(), because using torch.no_grad() outside
state, reward, done, _ = env.step(action)
returns += reward
env.render()