/flappy-bird-rl

Reinforcement Learning algorithms applied on simple Flappy Bird game

Primary LanguageJupyter Notebook

Flappy Bird Gym with Reinforcement Learning

This project implements two different reinforcement learning algorithms for the Flappy Bird Gym environment. The two algorithms used are Expected SARSA and Deep Q Learning.

Getting Started

To get started, you'll need to have the following installed:

  • Python 3.x
  • OpenAI Gym
  • PyTorch

You can install the required packages using conda: conda create -f environment.yml

Running the Code

Playing the Game

To run the code, simply clone the repository and navigate to the main directory. Then, run the following command: python3 main.py -agent dqn

The above command will run the Deep Q Learning algorithm on the Flappy Bird Gym environment. The -render flag will render the game in your terminal. To run the Expected SARSA algorithm, simply replace dqn with expected_sarsa.

Training the Agent

To train the agent, simply add the -train flag to the command above. For example, to train the Deep Q Learning agent, run the following command: python3 main.py -agent dqn -train

The above command will train the Deep Q Learning agent for the number of iterations indicated in the configuration file. To train the Expected SARSA agent, simply replace dqn with expected_sarsa.

Changing the Hyperparameters

To change the hyperparameters, simply edit the files in the config dir, depending on if you want to simulate one run or perform a full sweep on hyperparams. The hyperparameters are as follows:

  • Expected SARSA

    • NUM_ACTIONS: The number of actions the agent can take.
    • NUM_EPISODES: The number of episodes to train the agent for.
    • NUM_STEPS: The number of steps to run the agent for.
    • STEP_SIZE: The step size for the agent.
    • EPSILON: The epsilon value for the epsilon-greedy policy.
    • DISCOUNT: The discount factor for the agent.
  • Deep Q Learning

    • NUM_ACTIONS: The number of actions the agent can take.
    • STATE_SIZE: The size of the state vector.
    • NUM_EPISODES: The number of episodes to train the agent for.
    • EPSILON_START: The epsilon value for the epsilon-greedy policy at the start of training.
    • EPSILON_END: The epsilon value for the epsilon-greedy policy at the end of training.
    • EPSILON_DECAY: The decay rate for the epsilon value.
    • DISCOUNT: The discount factor for the agent.
    • BATCH_SIZE: The batch size for the Deep Q agent.
    • REPLAY_BUFFER_SIZE: The size of the replay memory for the Deep Q agent.
    • SEED: The seed for the random number generator.
    • TAU: The soft update parameter for the target network.
    • LR: The learning rate for the agent.

Results

You can find the results & models for the trained agents in the results dir.