This seminar was created to get familiar with modern RL

mipt_course

contains solutions for tasks proposed by RL course at MIPT, which is based on David Silver's course. All tasks use OpenAI gym environments.

deephack

contains our attempts to solve Skiing game, a problem for qualification round of DeepHackLab hackathon. Core of model consists of training convolutional autoencoder with dense layers in bottleneck. Before trainig, we convert images from RGB to greys and compress it to 60x60. With autoencoder, we are obtaining ability to get low-dimensional features for images (64, basically). Code presented in autoencoder_simple_features.ipynb.

Then, we have 3 main directions of evolution:

parametrize agent's policy and use policy gradient algorithms, e.g. Monte-Carlo Policy Gradient (REINFORCE). Code presented in Skiing.ipynb
approximate value or action-value function and use epsilon-greedy policy. Code presented in linear_fa.ipynb
collect more features, e.g. via object detection. NB: due to competition rules, features should NOT be environment-specific. Code presented in features_demo.ipynb

izmailovpavel/rl-seminar

This seminar was created to get familiar with modern RL

mipt_course

deephack