/bipedal-walker-td3

Project for Artificial Intelligence course at University of Ljubljana, Faculty of Computer and Information science.

Primary LanguagePythonMIT LicenseMIT

Project description

Implementation of the TD3 - twin delayed DDPG algorithm for reinforcement learning (original publication link), particularlly usefull for continuous action space-continuous state space problems.

The algorithm was tested on the BipedalWalker-v3 environment. In order to evaluate the variability of this algorithm, we trained 15 different agents on a high-performance GPU with CUDA for 550 episodes. We recorded the obtained reward by each agent, and obtained the following results:

ci_plot

The learning process can be observed on the following video: run_simulation

Technical details about the algorithm can be found in the acompanying report.