This repository consists projects from Deep Learning Türkiye - Reinforcement Learning Group. Enter folders to see each project's details.
Simple tic tac toe example. Learns via Value Function at the moment. Policy Search TODO. Benefited from tansey.
Provides the underlying testbed for bandit problem.
Uses the OpenAI Gym. Learns via Q-Learning.
Multiple approaches to CartPole problem. Benefited from dennybritz.
You can find example usage below.
import gym
from lib import q_learning_agent, double_q_learning_agent, sarsa_learning_agent
env = gym.make("FrozenLake-v0")
env.reset()
def train(agent):
for i_episode in range(1000):
state = env.reset()
while True:
action = agent.select_action(state)
next_state, reward, done, _ = env.step(action)
agent.learn(action, reward, state, next_state)
if done:
break
state = next_state
qla = q_learning_agent(epsilon=0.3, discount_factor=0.9, alpha=0.5, action_space=env.action_space.n)
sla = sarsa_learning_agent(epsilon=0.3, discount_factor=0.9, alpha=0.5, action_space=env.action_space.n)
dqla = double_q_learning_agent(epsilon=0.3, discount_factor=0.9, alpha=0.5, action_space=env.action_space.n)
train(qla)
train(sla)
train(dqla)