not really understand the effect of "exp_op" in the generator loss
Opened this issue · 0 comments
winnechan commented
# Keeps track of the "expected reward" at each timestep.
expected_reward = tf.Variable(tf.zeros((SEQUENCE_MAXLEN,)))
reward = d_preds - expected_reward[:tf.shape(d_preds)[1]] # minus zeros??
mean_reward = tf.reduce_mean(reward)
# This variable is updated to know the "expected reward". This means
# that only results that do surprisingly well are "kept" and used
# to update the generator.
exp_reward_loss = tf.reduce_mean(tf.abs(reward))
exp_op = reward_opt.minimize(
exp_reward_loss, var_list=[expected_reward]) # why update an irrelevant variable??