Generator converging to wrong output
sebastianGehrmann opened this issue · 1 comments
Hey Tao,
I am trying to implement your rationale model in pytorch right now and I keep running into the problem, that after a couple iterations z becomes all one's. This obviously makes the encoder quite strong but does not do what I want.
The generator's loss function is cost(x,y,z) * logp(z|x)
. While the first term is large, logpz becomes all zeros (since the model learns to predict all ones with 100% prob and log(1) = 0). Therefore, the overall loss (and derivative) for the generator becomes zero, leading to this all one's phenomenon.
How did you address this in your code?
The issue is probably due to one of the following reasons:
-
The cost function includes an "sparsity" regularization that tries to suppress the selection too many one's. In my current implementation, the weight of this regularization (--sparsity) has to be carefully tuned, so neither the model selects all one's nor it selects all zero's.
I monitored the % of one's on the training and dev sets, and found a reasonable value range for the beer review dataset. -
Since the learning procedure samples the gradient via REINFORCE, the variance of gradient is high and the model sometimes suddenly "jumps" to the bad optima of selecting all one's or zero's. I saw this more often for the dependent-selection version. To alleviate this, I used a larger batch size of 256 and a smaller initial learning rate. The code also monitors the cost value on train and dev sets. If the cost jumps after one epoch, I redo the parameter changes of this epoch and halve the learning rate. See this.
For general REINFORCE and reinforcement learning, there are more principled ways of reducing the gradient variance. One is called "baseline trick". See Jiwei's follow-up paper (page 8).
Hope this can help!