Repeated calculation

It seems that the commutative variable is recalculated here：

First time:

Line 403 in 2f3e516

    
           r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True)  # todo: compute r_coef: = g^T H^{-1} b

Repeated :

Line 426 in 2f3e516

    
           r_coef = (reward_loss_grad * b_step_dir).sum(0, keepdim=True)  # todo: compute r_coef: = g^T H^{-1} b

Hi, Benfen, Thanks for comments, We will clean the code again.