The repository compiles a list of real-world applications of reinforcement learning.
- Only include methods that was deployed, is currently deployed, or will be deployed in the future.
- Exclude RL applications to games and robotics where experiments were only done in simulation.
- Only include publicly available information.
The repository also aggregates information from several sources, including
We categorize RL applications based on the deployment status (e.g., currently deployed, deployed at least once/for some time, planned to be deployed, or unknown), and the approaches to solve the problems (e.g., online, offline, train with simulators, search with simulators, using offline data to build partial simulators).
- Industrial Control
- Energy Control
- Control of Physical Systems
- Large Language Models & Conversational Systems
- Other Applications without Deployment
- Real world gym
- Open Source Software
- Other Resources
AMII has been applying RL for water treatment plant.
Link: Blog Post
Deployment status: Planned to be deployed
Google Deepmind used RL to improve the energy efficiency of heating, ventilation and air conditioning (HVAC) control.
Link: Paper2022, NeurIPS2018
Deployment status: Deployed at least once/for some time. Experiments were done in real world facilities.
Approach: Online
Algorithm: Policy iteration, with value function estimated from offline data
Telus used RL to reduce energy consumption for data centers.
Link: Presentation, Announcement
Deployment status: Unknown
Approach: Train with simulators
Foobot used deep RL for HVAC optimization.
Link: Blog
Deployment status: Currently deployed (based on the information here)
Approach: Train with simulators
Difficulty: High dimensional action spaces
Algorithm: PPO with autoregressive policies
NVIDIA used RL for data center congestion control.
Link: Paper2023
Deployment status: Experiments were done in real world system.
Approach: Train with simulators
Difficulty: Constraints on low memory and low inference time, multi-agent POMDP
Algorithm: Policy gradient with LSTM layers -> distill to lightweight decision trees
Siemens Technology has been working on industrial applications of RL.
Link: Video
Deployment status: unknown
Phaidra has been working on using deep RL to improve plant stability and energy efficiency.
Link: Website, Technical Report
Deployment status: unknown
Microsoft Project Bonsai used RL for industrial control systems
Link: Website, Report
Deployment status: unknown
Approach: Train with simulators
Deepmind successfully controlling the nuclear fusion plasma in a tokamak with deep reinforcement learning.
Link: Nature2022, Post
Deployment status: Real-world experiments on TCV (an experimental tokamak)
Approach: Train with simulators
Algorithm: MPO (four-layer neural network for the actor, larger RNNs for the critic)
DeepThermal uses model-based offline RL to optimize the combustion efficiency of a thermal power generating unit.
Link: AAAI2022
Deployment status: Currently deployed (deployed in four large coal-fired thermal power plants in China)
Approach: Offline
Algorithm: offline model learning using LSTM + offline actor-critic with reward penalty
Google and Loon used RL to control a superpressure balloon in the stratosphere.
Link: Nature2020
Deployment status: Currently deployed
Approach: Using offline data to build partial simulators (wind simulation based on historical data)
Difficulty: Partial observability
Algorithm: Incorporate uncertainty estimates as additional inputs, QR-DQN with a seven-layers Relu network + parallel simulation
Swift achieved champion-level performance in drone racing.
Link: Nature2023
Deployment status: Deployed at least once/for some time (won several races against human champions)
Approach: Train with simulators + fine-tune by collecting more real-world data
Difficulty: Optimizing a policy purely in simulation yields poor performance on physical hardware
Algorithm: PPO + parallel simulation
OpenAI used Reinforcement Learning from Human Feedback (RLHF) for ChatGPT.
Link: Introducing ChatGPT, NeurIPS 2020
Deployment status: Currently deployed
Algorithm: PPO with learned reward models, penalizes the KL divergence between the RL policy and the original supervised model
Deepmind used RLHF for Sparrow.
Link: Sparrow, Blog
Deployment status: Unknown (the model was not released publicly)
Algorithm: A2C with learned reward models, penalizes the KL divergence between the fine-tuned policy and the initial teacher language model
Anthropic used Reinforcement Learning from AI Feedback for Claude.
Link: Constitutional AI, Iterated Online RLHF
Deployment status: Currently deployed
Algorithm: Preference labelling are done by an independent model (feedback model), instead of human. The remainder of the training pipeline is exactly the same as RLHF with PPO.
Meta used RLHF for Llama 2.
Link: Llama 2
Deployment status: Currently deployed
Algorithm: PPO with rejection sampling fine-tuning
Google developed a real-time and open-ended dialogue system using RL.
Link: Paper
Deployment status: Currently deployed in Google Assistant
Approach: Offline
Algorithm: Stochastic Action Q-learning & Continuous Action Q-learning & Conservative Q-learning
Yahoo (online bandits)
Azure AI Personalizer
Amazon inventory control
Link: Paper
Google Maps
Ridesharing
IRS uses bandits for audit selection
Link: Paper
Compiler Optimization
Memory mapping
Machine Learning for Mechanical Ventilation Control https://arxiv.org/pdf/2111.10434.pdf
The Emirates Team New Zealand won the America’s Cup with the help of an RL agent.
Link: Presentation
Matrix multiplication
Video compression
Apple used RL to learn a network defense policy
Link: Paper
Hewlett Packard Enterprise used RL to control Wave Energy Converters
Boeing used RL to optimize the obstacle avoidance policy.
Link: Paper
SustainGym: Reinforcement Learning Environments for Sustainable Energy Systems
CybORG: A Gym for the Development of Autonomous Cyber Agents
DCRL-Green: Sustainable Data Center Environment and Benchmark for Multi-Agent Reinforcement Learning
Pearl - A Production-ready Reinforcement Learning AI Agent Library from Meta
RLlib: Industry-Grade Reinforcement Learning
FinRL: Financial Reinforcement Learning
TRL: Transformer Reinforcement Learning from Hugging Face
RL4LMs: A modular RL library to fine-tune language models to human preferences from AI2
Towards Deployable RL - What’s Broken with RL Research and a Potential Fix by Shie Mannor and Aviv Tamar
Don’t Panic! Reinforcement learning is full of magical things patiently waiting for our wits to grow sharper by Marlos C. Machado
CMU Real World RL course by Emma Brunskill
MLJ Special Issue on Reinforcement Learning for Real Life
Reinforcement Learning for Real Life Workshop @ NeurIPS and ICML