not really understand the effect of "exp_op" in the generator loss

Question

not really understand the effect of "exp_op" in the generator loss

Opened this issue 6 years ago · 0 comments

        # Keeps track of the "expected reward" at each timestep.
        expected_reward = tf.Variable(tf.zeros((SEQUENCE_MAXLEN,)))
        reward = d_preds - expected_reward[:tf.shape(d_preds)[1]]  # minus zeros??
        mean_reward = tf.reduce_mean(reward)

        # This variable is updated to know the "expected reward". This means
        # that only results that do surprisingly well are "kept" and used
        # to update the generator.
        exp_reward_loss = tf.reduce_mean(tf.abs(reward))
        exp_op = reward_opt.minimize(
            exp_reward_loss, var_list=[expected_reward])  # why update an irrelevant variable??