adik993/ppo-pytorch

I found him good for discrete space when I ran the project, but I would like to know how to make use of it in continuous space?

liu-yuntao opened this issue · 0 comments

I want to train the agent using the project file after customizing the environment based on the gym's continuous space, the state and actions of the environment are defined as follows:

    self.min_action = np.array([[-3, -3, -3, -3, -3]]).reshape(1,5)
    self.max_action = np.array([[3, 3, 3, 3, 3]]).reshape(1,5)

    self.low_state = np.array(
        [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=np.float32
    ).reshape(1,10)
    self.high_state = np.array(
        [[50, 50, 50, 50, 50, 300, 300, 300, 300, 300]], dtype=np.float32
    ).reshape(1,10)

    self.action_space = spaces.Box(
        low=self.min_action, high=self.max_action, shape=(1, 5), dtype=np.float32
    )

    self.observation_space = spaces.Box(
        low=self.low_state, high=self.high_state, shape=(1, 10), dtype=np.float32
    )

Is it possible to implement this idea based on PPO ICM? Thanks!