cartpole learning(discrete)
CUN-bjy opened this issue · 3 comments
CUN-bjy commented
20.12.29
- poor performance even in cartpole..
- get back to the original. something wrong..
CUN-bjy commented
20.12.31
Finally, did it! -> https://github.com/CUN-bjy/gym-ddpg-keras/tree/cartpole-v1
changes for solving the problem
- output scale(**) calculation in actor
scalar = self.act_range * np.ones(self.act_dim) out = Lambda(lambda i: i * scalar)(output)
experiment specifications
hyperparameter:
- buffer_size = 20000, batch_size = 64
- prioritized buffer : False
- learning_rate: 1e-3,1e-2 for actor, critic
- tau(target update rate): 1e-2,1e-2 for actor, critic
- network
- actor:
# input layer(observations) input_ = Input(shape=self.obs_dim) # hidden layer 1 h1_ = Dense(24,kernel_initializer=GlorotNormal())(input_) h1_b = BatchNormalization()(h1_) h1 = Activation('relu')(h1_b) # hidden_layer 2 h2_ = Dense(16,kernel_initializer=GlorotNormal())(h1) h2_b = BatchNormalization()(h2_) h2 = Activation('relu')(h2_b) # output layer(actions) output_ = Dense(self.act_dim,kernel_initializer=GlorotNormal())(h2) output_b = BatchNormalization()(output_) output = Activation('tanh')(output_b) scalar = self.act_range * np.ones(self.act_dim) out = Lambda(lambda i: i * scalar)(output)
- critic
# input layer(observations and actions) input_obs = Input(shape=self.obs_dim) input_act = Input(shape=(self.act_dim,)) inputs = [input_obs,input_act] concat = Concatenate(axis=-1)(inputs) # hidden layer 1 h1_ = Dense(24, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(concat) h1_b = BatchNormalization()(h1_) h1 = Activation('relu')(h1_b) # hidden_layer 2 h2_ = Dense(16, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h1) h2_b = BatchNormalization()(h2_) h2 = Activation('relu')(h2_b) # output layer(actions) output_ = Dense(1, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h2) output_b = BatchNormalization()(output_) output = Activation('linear')(output_b)
- actor:
performance(total_reward)
critic_loss
CUN-bjy commented
the exploration when the agent is baby is the most important thing.
CUN-bjy commented
training time : about 1hour.
on intel i7 cpu.