2019ChenGong opened this issue 3 years ago · 1 comments
Thanks for your excellent work!
We have a question in the paper, "Conservative Q-Learning for Offline Reinforcement Learning", about the proof of Theorem 3.2. In the equation, .
Why can we know that ?
Thank you!