martin-fabbri/reinforcement-learning-playground

Reinforcement Learning Experiments

PythonApache-2.0

Reinforcement Learning Playground

What is reinforcement learning?

Reinforcement learning is a branch of machine learning.
Involves an agent and environment.
Agents learns optimal for maximizing rewards.

When should we worry about sequential decision making?

Limited supervision: you know what you want, but not how to get it.

Late consequences?

Why learn RL?

Not just for games
Make optimal decisions
Maximize efficiency

What are RL applications?

Robotics
Self-driving cars
Inventory management
Finantial investments
Decision-based situations

RL terminology

What is the agent?

The agent is the algorithm
Decides which action to tale
Agent monitors the environment
Who is learning
It's only outcome are decisions(actions, controls)

What is an environment?

The environment is everything the agent can interact with.
Agent's actions affect the environment.
It responds to actor's actions with consequences(observations, rewards estimation)

What is a state?

The state is a representation of what the agent can sense.
Does not always involve the entire environment. It's limited to what the agent can sense.

What is an action?

An action is what an agent can do is a given state.
Actions are limited by the environment.
The action's goal is to maximize reward.

What is the reward?

Result from making an action.
Feedback from the environment.
It can be positive or negative.
Helps encourage or discourage certain actions, policies or behaivours.
Is what the agent tries to optimize.
Rewards are hard to formulate.

Where do rewards come from?

When playing video games, rewards come from scores.

Are there other forms of supervision?

Learning from demostrations.
- Directly copying observed behavior.
- Inferring rewards from observed behavior.
Learning from observing the world.
- Learning to predict.
- Unsupervised Learning
Learning from other tasks
- Transfer learning

What is the standard reinforcement loop?

TODO

What is Deep Reinforcement Learning?

Deep learning: end-to-end training of expressive, multi-layer models.
Deep models are what allow RL algorithms to solve complex problems end-to-end.

Why Deep Reinforcement Learning?

Deep = can process complex sensory input

What can deep learning & RL do well now?

Adquire high degree of proficiency in domains governed by simple, known rules.
Learn simple skills with raw sensory inputs, given enough experience.
Learn from imitating enough human-provided expert behavior.

What has proven challenging so far?

Humans can learn incredibly quickly
Humans can reuse past knowledge
- Transfer learning in deep RL is an open problem
Not clear what the reward function should be

How do we build intelligent machines?

Learning as the basis of intelligence.

Some things we can all do.
Some things we can only learn.