/ReinforcementLearning

Focuses on Reinforcement Learning related concepts, use cases, and learning approaches

Primary LanguageJupyter Notebook

Reinforcement Learning (RL)

This repository focuses on Reinforcement Learning related concepts, use cases, point of views and learning approaches. These are purely based on my learnings, readings, experiences in dealing with practical / real-life context and scenarios.

Structure of Repository

Areas covered

  • Multi-Armed Bandit Problems (MABP)
  • Finite Markov Decision Processes (MDP)
  • Dynamic Programming Methods
  • Monte Carlo Methods
  • Temporal Difference (TD) Learning
  • Tabular Solution Methods and Approximate Solution Methods
  • Policy Gradient Methods

How to be Successful in implementing RL

  • Figure out the Adoption factor and ensure "right" stakeholder blessings are met upfront
  • Identify "appropriate" business use case within the context of the industry / sub-industry / sub-segment: relevancy is a must
  • Identify compute costs upfront and put together a "short term" and "long term" ROI plan to track tasks and how it benefits: we also need to see a pattern of our outcomes so that we can re-adjust and tweak the strategy in the process to stay effective and stay successful
  • Focus on simulation method and see how we can strategy for multiple use cases / related use cases and not just one or two use cases

This is where the difference between LEADERS and LAGGARDS in this space !!

Use Cases (Non-exhaustive, for understanding purposes)

Use Case Theme Description Industry Relevancy Category
Pricing and Promotion Analytics Ability to apply advanced pricing and promotion strategies to improve product margins Agriculture Next Best Actions for Customer
Waste and Cost reduction Optimize warehouse logistics and network for reduced waste and maintenance cost reduction Agriculture Optimize Complex Operations
Production Operations Management Solving Scheduling and Production allocation challenges to optimize and improvise yield Agriculture Optimize Complex Operations
Optimization of Product Design Process Ability to optimize product design processes to shorten development cycle for new vehicles, features and improvise quality Automotive Optimize Product Development Cycle / Design
Load Balancing Ability to balance the load of electricity grids in a situation of varying demand cycles Energy and Utilities Optimize Complex Operations
Yield Optimization Ability to enable real-time well monitoring and precision drilling for improved yield in Oil operations Energy and Utilities Optimize Complex Operations
Trading Strategy Optimization Ability to optimize the trading strategy for an options-trading portfolio Financial Services Optimize Complex Operations
Customer HyperPersonalization Delivering advanced personalization abilities that adapt promotions, next best offers and recommendations for increase customer satisfaction and increased sales Financial Services Next Best Actions for Customer
Clinical Trials The well being of patients during clinical trials is extremely important along with the actual results of the study. In this scenario, the exploration is equivalent to identifying the best treatment, and exploitation is treating patients as effectively as possible during the trial process. Life Sciences Optimize Complex Operations
Effective Inventory Management with Robotics Stock and pick inventory using Robots Retail and CPG Optimize Product Development Cycle / Design
Network Routing Routing is the process of selecting a path for traffic in a network, such as telephone networks or computer networks (internet) etc. Allocation of channels to the right users, such that the overall throughput is maximised, can be formulated as a MABP. Generic / Common Optimize Product Development Cycle / Design
Online Advertising The goal of an advertising campaign is to maximise revenue from displaying ads. The advertiser makes revenue every time an offer is clicked by a web user. Similar to MABP, there is a trade-off between exploration, where the goal is to collect information on an ad’s performance using click-through rates, and exploitation, where we stick with the ad that has performed the best so far. Generic / Common Next Best Actions for Customer

Other References:

Bsuite from DeepMind team

"Staying Current" in RL

  • There are 3 key aspects which are pertinent to greater control of RL algorithms and it's solving power:
    • Design approach to see how rewards can be maximized when agent learns
    • Importance and relevancy of the Learning environment
    • Compute power which is significant where we look for approximation or linear/non-linear function approximations
  • Soft-actor critic algorithms are significantly increasing the training efficiency and decreasing compute costs
  • Some of the Key cloud computing work that can be looked at:
    • Microsoft Project Bonsai Here
    • Google SEED-RL Here
    • Amazon Sagemaker RL Here

Reference Materials

Resources

Using RL and multi-armed bandits to find Best Classification Model

FAQ