/Reinforcement-Learning-and-advanced-Deep-Learning

Exercises and project of RLD - M2 DAC, Sorbonne University

Primary LanguageJupyter Notebook

Reinforcement Learning and advanced Deep Learning (RLD)

Exercises and project of RLD - M2 DAC+M2A, Sorbonne University

Students: Tianwei LAN, Jacques ROUGE

TME1: Upper Confidence Bound (UCB) and Linear Upper Confidence Bound (LinUCB)
TME2: Value Iteration and Policy Iteration
TME3: Q-Learning
TME4: Deep Q-Network (DQN)
TME5: Actor-Critic
TME6: Proximal Policy Optimization (PPO) with Adaptative KL and with Clipped Objective
TME7: Deep Deterministic Policy Gradient (DDPG)
TME8: Generative Adversarial Network (GAN)
TME9: Variational Autoencoder (VAE)
TME10: Multi-Agent Deep Deterministic Policy Gradient (MADDPG)