aviralkumar2907/CQL

About the derivation in paper

2019ChenGong opened this issue 3 years ago · 1 comments

2019ChenGong commented 3 years ago

About the derivation in paper

2019ChenGong commented 3 years ago

Thanks for your excellent work!

We have a question in the paper, "Conservative Q-Learning
for Offline Reinforcement Learning", about the proof of Theorem 3.2. In the equation,
$\hat{V}^{\pi}(\mathbf{s})=V^{\pi}(\mathbf{s})-\alpha \left[\underbrace{\left(I-\gamma P^{\pi}\right)^{-1}}_{\text {non-negative entries }} \underbrace{\mathbb{E}_{\pi}\left[\frac{\pi}{\pi_{\beta}}-1\right]}_{\geq 0} \right](\mathbf{s})$ .

Why can we know that $\mathbb{E}_{\pi}\left[\frac{\pi}{\pi_{\beta}}-1\right] \geq 0$ ?

Thank you!