- Reinforcement learning is a branch of machine learning.
- Involves an agent and environment.
- Agents learns optimal for maximizing rewards.
Limited supervision: you know what you want, but not how to get it.
Late consequences?
- Not just for games
- Make optimal decisions
- Maximize efficiency
- Robotics
- Self-driving cars
- Inventory management
- Finantial investments
- Decision-based situations
- The agent is the algorithm
- Decides which action to tale
- Agent monitors the environment
- Who is learning
- It's only outcome are decisions(actions, controls)
- The environment is everything the agent can interact with.
- Agent's actions affect the environment.
- It responds to actor's actions with consequences(observations, rewards estimation)
- The state is a representation of what the agent can sense.
- Does not always involve the entire environment. It's limited to what the agent can sense.
- An action is what an agent can do is a given state.
- Actions are limited by the environment.
- The action's goal is to maximize reward.
- Result from making an action.
- Feedback from the environment.
- It can be positive or negative.
- Helps encourage or discourage certain actions, policies or behaivours.
- Is what the agent tries to optimize.
- Rewards are hard to formulate.
- When playing video games, rewards come from scores.
-
Learning from demostrations.
- Directly copying observed behavior.
- Inferring rewards from observed behavior.
-
Learning from observing the world.
- Learning to predict.
- Unsupervised Learning
-
Learning from other tasks
- Transfer learning
- TODO
- Deep learning: end-to-end training of expressive, multi-layer models.
- Deep models are what allow RL algorithms to solve complex problems end-to-end.
- Deep = can process complex sensory input
- Adquire high degree of proficiency in domains governed by simple, known rules.
- Learn simple skills with raw sensory inputs, given enough experience.
- Learn from imitating enough human-provided expert behavior.
- Humans can learn incredibly quickly
- Humans can reuse past knowledge
- Transfer learning in deep RL is an open problem
- Not clear what the reward function should be
Learning as the basis of intelligence.
- Some things we can all do.
- Some things we can only learn.