deep-daya/Batch-Reinforcement-Learning

This project finds the best policy for three different Markov decision processes given sampled transitions, each consisting of a state, action, reward, and next state without exploration

Python

Stargazers

No one’s star this repository yet.