CUN-bjy/gym-ddpg-keras

cartpole learning(discrete)

CUN-bjy opened this issue · 3 comments

20.12.29

  • poor performance even in cartpole..
  • get back to the original. something wrong..

20.12.31
Finally, did it! -> https://github.com/CUN-bjy/gym-ddpg-keras/tree/cartpole-v1

changes for solving the problem

  • output scale(**) calculation in actor
    scalar = self.act_range * np.ones(self.act_dim)
    out = Lambda(lambda i: i * scalar)(output)

experiment specifications
hyperparameter:

  • buffer_size = 20000, batch_size = 64
  • prioritized buffer : False
  • learning_rate: 1e-3,1e-2 for actor, critic
  • tau(target update rate): 1e-2,1e-2 for actor, critic
  • network
    • actor:
        # input layer(observations)
        input_ = Input(shape=self.obs_dim)
      
        # hidden layer 1
        h1_ = Dense(24,kernel_initializer=GlorotNormal())(input_)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(16,kernel_initializer=GlorotNormal())(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(self.act_dim,kernel_initializer=GlorotNormal())(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('tanh')(output_b)
        scalar = self.act_range * np.ones(self.act_dim)
        out = Lambda(lambda i: i * scalar)(output)
    • critic
        # input layer(observations and actions)
        input_obs = Input(shape=self.obs_dim)
        input_act = Input(shape=(self.act_dim,))
        inputs = [input_obs,input_act]
        concat = Concatenate(axis=-1)(inputs)
      
        # hidden layer 1
        h1_ = Dense(24, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(concat)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(16, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(1, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('linear')(output_b)

cartpole

performance(total_reward)

reward 2020-12-31 02-22-55

critic_loss

critic_loss 2020-12-31 02-22-52

the exploration when the agent is baby is the most important thing.

training time : about 1hour.
on intel i7 cpu.