Some help setting up a toy example: where to define the action space?

Question

Some help setting up a toy example: where to define the action space?

Opened this issue 5 years ago · 0 comments

Hi,

I've been playing with this library, but have some trouble setting up a toy example. I want to learn an agent to eat a cake. The reward is the log of the size eaten, plus a bonus if it's raining. Here's my step function:

library(reinforcelearn)
library(tidyverse)

# Create environment.
step <- function(self, action, prob = 0.7) {
  cake_size <- self$state[[1]]
  weather_status <- self$state[[2]]

  cake_size <- cake_size - action
  weather_status <- sample(x = c("rain", "sun"), size = 1, prob = c(prob, 1 - prob))
  bonus <- ifelse(weather_status == "rain", 10, 0)

  reward <- ifelse(cake_size < 0, -Inf, log(action + bonus))
  done <- ifelse(cake_size == 0, TRUE, FALSE)

  state <- list("cake_size" = cake_size, "weather_status" = weather_status)
  list(state, reward, done)
}

There's two states to keep track of, the size of the cake, and the status of the weather. Weather status evolves stochastically. I compute the reward, and say that there will be an infinite negative reward if the agent tries to eat more cake than there is.
As long as there's cake, the done flag is set to FALSE.

This is the reset function:

reset = function(self) {
  cake_size <- 10
  weather_status <- "sun"
    state = list("cake_size" = cake_size, "weather_status" = weather_status)
    state
}

Now comes my first question: where do I set up the action space? I would like my agent to be able to eat 1, 2, 3,..., 10 pieces of cake. But I am not sure where to define this.

I now define the environement:

env = makeEnvironment("custom", step = step, reset = reset)

And the value function: there's 20 states (10 cake sizes, and 2 types of weather), for actions, I say 10 (but as explained above, I cannot say which 10).

val_fun <- makeValueFunction("table", n.states = 20L, n.actions = 10L)

Now when I try to run the below:

agent = makeAgent(policy = "softmax", val.fun = val_fun, algorithm = "qlearning")

interact(env, agent, n.steps = 20L)

I get the following error message:

Error in state + 1L : non-numeric argument to binary operator

I don't know if this is linked to my question above. I don't understand where this state + 1L computation comes from.

Thanks in advance for your help!