RodneyShag/GridWorldMDP
Uses Markov decision processes (MDPs) and Temporal Difference (TD) Q-learning to maximize reward in a "grid world".
Java
No issues in this repository yet.
Uses Markov decision processes (MDPs) and Temporal Difference (TD) Q-learning to maximize reward in a "grid world".
Java
No issues in this repository yet.