/cow-simulator

Cow Simulation for Applied Deep Learning

Primary LanguagePython

Cow Simulator

usage

In order to run the application install the dependencies and type:

python3 deepcow/run.py play # to let the best 2 models trained by train_cow or train_wolf play against each other

python3 deepcow/run.py train_cow # to train the cow

python3 deepcow/run.py train_wolf # to train the wolf

python3 deepcow/run.py train_both # to train both at the same time

in your terminal.

References

scientific papers

Proximal Policy Optimization Algorithms

Overcoming catastrophic forgetting in neural networks

articles/tutorials that were used in preparation

Control a cart

Youtube Playlist I got my inspiration from

Multi-agent actor-critic for mixed cooperative-competitive environments

Emergent Tool Use From Multi-Agent Autocurricula

When Worlds Collide: Simulating Circle-Circle Collisions

Quick Tip: Use Quadtrees to Detect Likely Collisions in 2D Space

Topic

This project includes a Reinforcement Learning strategy in a dynamic, multi-agent environment.

Type

The type of this project is Bring your own data for reinforcement learning projects, because it provides a new environment for reinforcement learning strategies. Additionally it includes basic neural networks for every actor and learning algorithms for them.

Summary

Description

This project consists of a simulation that simulates a partially observable, multi-agent, dynamic, continuous in space, discrete in time and partly unknown (missing knowledge about laws of physics) environment.

There are two actors that can interact consciously with the environment: a cow and wolf. Additionally, there is another entity called grass. Each entity has a certain energy level. The cow gets energy by touching grass, the wolf by touching cows. Each entity loses energy by touching its counterpart or moving around. The goal of each actor is to obtain as much energy as possible. If the energy level of the cow or the grass drops below zero the environment is reset. An actor perceives its environment, by sending out rays with a limited reach. The rays return the color of the actor they intersect with, black if they intersected with the game border or white if they did not intersect with anything. The next figure shows a visualisation of the rays, the cow (brown), the wolf (blue), the grass (red) and a visualisation of the rays.

figure1

The little black circles represent their head. To implement the actors' AI deep Q learning as described in the lecture was used, however it does not achieve wanted results as of yet.

Dataset

There is no real dataset. The project implements the environment and a deep q learning algorithm for the actors and gives an visualisation of the state of the world.

Error metric

The performance measure of each actor is the reward (the energy gain) of the agent. The first environment and simple dqn agents did not behave reasonable. So their reward oscillated.

figure

In oder to get better results a negative reward was added if an agent hits the border. Additionally the agents actions were changed to move the agent relative to its direction instead of relative to the screen.

figure

As the border collision count is interesting as well, it was also captured and plotted. The goal was to have an average collision count of 5. As seen in the following plot both agents learnt to avoid borders in certain epochs. The exploration rate goes down with each epoch, so agents rely more on their neuronal network in latter epochs.

figure

Last but not least a more complicated neural network was trained for approximating the q function. The border collision was not penalized anymore.

figure

My personal goal was to have an constant reward of 0.3 (an agent can only obtain 1.0 rewards per game). However this was too difficult for the deep q network I trained. This can also be due to not enough training.

Documentation

The entry point is the main.py file. Start by installing the dependencies listed in .circleci/dependencies.txt and running the main.py file.

figure

Work-Breakdown structure

Individual Task   Time estimate   Time used
research topic and first draft 5h 8h
building environment 10h 14h
setting up cuda, cudnn... on manjaro 20m 21h
designing and building an appropriate network    20h 25h
fine-tuning that network 10h 12h
building an application to present the results 5h 4h
writing the final report 10h 3h
preparing the presentation of the project 5h 1h