
Wrong implementation of AIRL

Ericonaldo opened this issue · 0 comments

I check the code and I wonder if you implement AIRL simply by changing the reward function as the disc logit? This is different from the original paper where they use a disentangled discriminator which is computed by f / f + \pi where f is an approximation of "exp(r)" and \pi is the policy.