/RL_self_driving_taxi

design a simulation of a self-driving cab with cross-entropy method

Primary LanguageJupyter Notebook

RL_self_driving_taxi

design a simulation of a self-driving cab with cross-entropy method Let's design a simulation of a self-driving cab. The major goal is to demonstrate, in a simplified environment, how you can use RL techniques to develop an efficient and safe approach for tackling this problem.

The Smartcab's job is to pick up the passenger at one location and drop them off in another. Here are a few things that we'd love our Smartcab to take care of:

Drop off the passenger to the right location. Save passenger's time by taking minimum time possible to drop off Take care of passenger's safety and traffic rules There are different aspects that need to be considered here while modeling an RL solution to this problem: rewards, states, and actions.

  1. Rewards

Since the agent (the imaginary driver) is reward-motivated and is going to learn how to control the cab by trial experiences in the environment, we need to decide the rewards and/or penalties and their magnitude accordingly. Here a few points to consider: The agent should receive a high positive reward for a successful dropoff because this behavior is highly desired The agent should be penalized if it tries to drop off a passenger in wrong locations The agent should get a slight negative reward for not making it to the destination after every time-step. "Slight" negative because we would prefer our agent to reach late instead of making wrong moves trying to reach to the destination as fast as possible

  1. State Space

In Reinforcement Learning, the agent encounters a state, and then takes action according to the state it's in. The State Space is the set of all possible situations our taxi could inhabit. The state should contain useful information the agent needs to make the right action. Let's say we have a training area for our Smartcab where we are teaching it to transport people in a parking lot to four different locations (R, G, Y, B): Let's assume Smartcab is the only vehicle in this parking lot. We can break up the parking lot into a 5x5 grid, which gives us 25 possible taxi locations. These 25 locations are one part of our state space. Notice the current location state of our taxi is coordinate (3, 1). You'll also notice there are four (4) locations that we can pick up and drop off a passenger: R, G, Y, B or [(0,0), (0,4), (4,0), (4,3)] in (row, col) coordinates. Our illustrated passenger is in location Y and they wish to go to location R. When we also account for one (1) additional passenger state of being inside the taxi, we can take all combinations of passenger locations and destination locations to come to a total number of states for our taxi environment; there's four (4) destinations and five (4 + 1) passenger locations. So, our taxi environment has 5×5×5×4=500 total possible states. The agent encounters one of the 500 states and it takes an action. The action in our case can be to move in a direction or decide to pickup/dropoff a passenger. In other words, we have six possible actions: south north east west pickup dropoff This is the action space: the set of all the actions that our agent can take in a given state.

  1. Action Space

The agent encounters one of the 500 states and it takes an action. The action in our case can be to move in a direction or decide to pickup/dropoff a passenger. In other words, we have six possible actions:

south north east west pickup dropoff This is the action space: the set of all the actions that our agent can take in a given state.