ZipingXu/Factored-MDP-Approximate-Solution

Code for paper: Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

Python

Watchers