daddabarba/NHRL

An adaptive algorithm, which should abstract temporally extended actions online, without the need for additional background information (besides a Markovian description of the environment). Several Reinforcement Learning algorithms where embedded in a Hierarchy of policies, among which n-step QL, Expected Sarsa, LSTM neural networks (for Q value learning), Deep Mind's Deep Q-learning architecture, and simultaneous off-policy training (of all abstract actions).

Python

Stargazers

mattiaforc
Bologna, Italy
donghaiwang
中国
xiaoerlaigeid
stinbuaa