PyTorch implementation of BCQ for "Off-Policy Deep Reinforcement Learning without Exploration"
Primary LanguagePython