I found him good for discrete space when I ran the project, but I would like to know how to make use of it in continuous space?
liu-yuntao opened this issue · 0 comments
liu-yuntao commented
I want to train the agent using the project file after customizing the environment based on the gym's continuous space, the state and actions of the environment are defined as follows:
self.min_action = np.array([[-3, -3, -3, -3, -3]]).reshape(1,5)
self.max_action = np.array([[3, 3, 3, 3, 3]]).reshape(1,5)
self.low_state = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=np.float32
).reshape(1,10)
self.high_state = np.array(
[[50, 50, 50, 50, 50, 300, 300, 300, 300, 300]], dtype=np.float32
).reshape(1,10)
self.action_space = spaces.Box(
low=self.min_action, high=self.max_action, shape=(1, 5), dtype=np.float32
)
self.observation_space = spaces.Box(
low=self.low_state, high=self.high_state, shape=(1, 10), dtype=np.float32
)
Is it possible to implement this idea based on PPO ICM? Thanks!