RuntimeError: invalid 'type' (environment) of argument

Question

RuntimeError: invalid 'type' (environment) of argument

rjpeng98 opened this issue 9 months ago · 8 comments

He, I am using keras and tensorflow in R to train a mixture density network. My customized loss function has been tested.
However, when I try to fit the model, there is always an error:

My codes are following here:

library(keras)
library(tensorflow)

num_components = 2  # Number of mixture components

input <- layer_input(shape = c(100))  # a 1-dimensional input

# Define a hidden layer
hidden <- input %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 64, activation = 'relu')

# Output layers for mixture components
mu <- hidden %>%
  layer_dense(units = num_components, name = 'mu')  # Means of the Gaussians

sigma <- hidden %>%
  layer_dense(units = num_components, activation = 'softplus', name = 'sigma')  # Standard deviation of the Gaussians (positive)

p <- hidden %>%
  layer_dense(units = num_components, activation = 'softmax', name = 'p')  # Mixture coefficients (sum to 1)

model <- keras_model(inputs = input, outputs = list(mu, sigma, p))


mdn_loss <- function(y_true, model_output) {
  # Extract components from model output
  mu = model_output[[1]]
  sigma = model_output[[2]] + keras::k_epsilon()
  p = model_output[[3]]
  
  single_gaussian_nll = function(y, mu, sigma, p) {
    return(-log(sum(exp(log(p)+
                          (-log(sigma)-log(2*pi)/2-1/2*((y-mu)/sigma)^2)))))
  }
  
  total_nll <- 0
  for (i in 1:nrow(y_true)) {
    
  total_nll = sum(total_nll, (single_gaussian_nll(y_true[i], mu, sigma, p)))
  
  }
  
  return((total_nll))
}

model %>% compile(
  optimizer = 'adam',
  loss = mdn_loss)


#data simulation 
theta_alpha<- -10
theta_beta<- 10
alpha<- 1/4
sigma_1<- 1
sigma_2<- 0.1
n<-10000 #number of samples from prior distribution

theta_prior<- runif(n, min = theta_alpha, max = theta_beta)
x_simulated<- matrix(nrow = n,ncol = 100)

for (i in 1:n) {
  for (j in 1:100) {
  
  indic<- rbinom(1,1,alpha)
  x_simulated[i,j]<- indic*rnorm(1,mean = theta_prior[i], sd = sigma_1)+
    (1-indic)*rnorm(1, mean = -theta_prior[i], sd = sigma_2)
  }
}

model %>% fit(x_simulated,  matrix(theta_prior, ncol = 1), epochs = 10, batch_size = 100)

Thanks in advance for your any comments.

Answer 1 · 2024-04-02T14:38:38.000Z

Using the functional API in Keras to train a multi-output model, the value supplied to fit(loss = ) is expected to be a list of callables, with the same length as model outputs. Each output will be called with only one output. This works great if the loss from each output can be calculated without the values of the other outputs.

However, this API does not cover the use case when you have multiple outputs, and you need the values of all the outputs in one scope to calculate the loss. There are however two straightforward ways to do this:

If all the outputs have compatible shapes, you can call layer_concatenate() to combine the outputs along an axis, and then unstack them in the custom loss function.

Here is your code adapted using this approach. Note, I updated it to use keras3 instead of keras, and I used the new op_* functions where appropriate. (e.g., replaced the for loop with op_vectorized_map():

model <- keras_model(inputs = input, outputs = layer_concatenate(mu, sigma, p))

custom_loss_fn <- function(y_true, y_pred) {
  str(y_true)
  str(y_pred)
  ## browser() is safe to use here to be able to work with the `y_true` and
  ## `y_pred` tracing tensors interactively. Just be sure to exit the browser
  ## context by pressing "Continue" (to raise an error) than by "Quit". If you
  ## "Quit" the R browser context, it leaves the TensorFlow tracing context
  ## open, and nothing else will work as expected (and it will eventually
  ## segfault).
  # browser()
  c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)

  sigma %<>% `+`(config_epsilon())

  single_gaussian_nll <- function(.x) {
    c(y, mu, sigma, p) %<-% .x
    -log(sum(exp(
      log(p) +
        (-log(sigma) - log(2 * pi) / 2 - 1 / 2 * ((y - mu) / sigma) ^ 2)
    )))
  }

  total_nll <-
    op_sum(op_vectorized_map(list(y_true, mu, sigma, p),
                             single_gaussian_nll))

  total_nll
}

You can subclass Model and define a custom train_step.
See https://keras.posit.co/articles/custom_train_step_in_tensorflow.html for examples.

Note, I don't think this example requires a custom train_step, that would only be required if the outputs did not share a shape and could not be concatenated. If you still want to have a model with 3 outputs, you can define two models that share weights, one for training, and one for inference. E.g.,

model <- ... # same as before
training_model <- keras_model(inputs = model$inputs,
                              outputs = layer_concatenate(model$outputs))
                              
# Training 'training_model' will also train 'model', since the two
# models share weights. 
training_model |> compile() |> fit() # same as before
model |> predict()                   # inference from trained model with 3 outputs

Answer 2 · 2024-04-02T15:47:22.000Z

Thanks for your prompt reply.

It works but it returns to me only nans.

Could you please give me more comments on it? (In my codes, epsilon and LogSumExp have been used to avoid such an issue.)

Thanks in advance.

Answer 3 · 2024-04-02T16:29:57.000Z

You can pass compile(run_eagerly = TRUE), and then insert browser(), print(), str(), and message() calls in your custom loss function as needed to track down where the NaN's are coming from. E.g.,

custom_loss_fn <- function(y_true, y_pred) {
  ... # same as before

  single_gaussian_nll <- function(.x) {
    c(y, mu, sigma, p) %<-% .x
    result <- ... # calculate as before
    if (py_bool(op_isnan(result))) browser()
    result
  }

  total_nll <- ... # same as before

  if (py_bool(op_isnan(total_nll))) browser()
  total_nll
}

model |> compile(run_eagerly = TRUE, loss = custom_loss_fn)

Answer 4 · 2024-04-02T16:59:15.000Z

Many thanks.

Answer 5 · 2024-04-03T01:03:56.000Z

Greetings

I tried to debug my codes by adding "if (py_bool(op_isnan(result))) browser()" in "custom_loss_fn".
However, there is an error related to py_bool as

I appreciate your any comments as usual. Btw, by browser(), it looks like the y_pred is NA after the very first iteration. Does it mean my network is wrong?

The codes are as following:

#install_keras()
library(keras3)
library(tensorflow)
library(reticulate)

num_components = 2  # Number of mixture components

input <- layer_input(shape = c(100))  # a 1-dimensional input

# Define a hidden layer
hidden <- input %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 64, activation = 'relu')

# Output layers for mixture components
mu <- hidden %>%
  layer_dense(units = num_components, name = 'mu')  # Means of the Gaussians

sigma <- hidden %>%
  layer_dense(units = num_components, activation = 'softplus', name = 'sigma')  # Standard deviation of the Gaussians (positive)

p <- hidden %>%
  layer_dense(units = num_components, activation = 'softmax', name = 'p')  # Mixture coefficients (sum to 1)

model <- keras_model(inputs = input, outputs = layer_concatenate(mu, sigma, p))

custom_loss_fn <- function(y_true, y_pred) {
  str(y_true)
  str(y_pred)
  ## browser() is safe to use here to be able to work with the `y_true` and
  ## `y_pred` tracing tensors interactively. Just be sure to exit the browser
  ## context by pressing "Continue" (to raise an error) than by "Quit". If you
  ## "Quit" the R browser context, it leaves the TensorFlow tracing context
  ## open, and nothing else will work as expected (and it will eventually
  ## segfault).
  
  c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)
  
  
  sigma %<>% `+`(config_epsilon())
  

  single_gaussian_nll <- function(.x) {
    c(y, mu, sigma, p) %<-% .x
    result<- -log(sum(exp(
      log(p) +
        (-log(sigma) - log(sqrt(2 * pi))  - (1 / 2) * ((y - mu)^2 /sigma^2))
    )))
    if (py_bool(op_isnan(result))) browser()
    result
    
  }

  total_nll <-
    op_sum(op_vectorized_map(list(y_true, mu, sigma, p),
                             single_gaussian_nll))
  
  total_nll
  
}

#debug
model |> compile(run_eagerly = TRUE, loss = custom_loss_fn)

#data simulation 
theta_alpha<- -10
theta_beta<- 10
alpha<- 1/4
sigma_1<- 1
sigma_2<- 0.1
n<-10000 #number of samples from prior distribution

theta_prior<- runif(n, min = theta_alpha, max = theta_beta)
x_simulated<- matrix(nrow = n,ncol = 100)

for (i in 1:n) {
  for (j in 1:100) {
  
  indic<- rbinom(1,1,alpha)
  x_simulated[i,j]<- indic*rnorm(1,mean = theta_prior[i], sd = sigma_1)+
    (1-indic)*rnorm(1, mean = -theta_prior[i], sd = sigma_2)
  }
}

model %>% fit(x_simulated,  matrix(theta_prior, ncol = 1), epochs = 10, batch_size = 100)

Many thanks.

Answer 6 · 2024-04-03T14:59:31.000Z

Thanks, I could reproduce. This was slightly harder to track down than I expected, because op_vectorized_map() traces f even in eager mode. I will add an example to the docs of op_vectorized_map(), showing how implement a debuggable version of it, op_vectorized_map_debug().

The issue is that the custom loss you are calculating returns inf sometimes, which the optimizer then uses to updates the weights into nan. The nan values in y_pred are only encountered after the first batch of updates.

Here is your code updated with inserted if(py_bool(op_any(op_isinf(total_nll)))) ...
calls, using op_vectorized_map_debug().

#install_keras()
Sys.setenv("CUDA_VISIBLE_DEVICES"="")
library(keras3)
# library(tensorflow, exclude = c("set_random_seed", "shape"))
library(reticulate)

num_components = 2  # Number of mixture components

input <- layer_input(shape = c(100))  # a 1-dimensional input

# Define a hidden layer
hidden <- input %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 64, activation = 'relu')

# Output layers for mixture components
mu <- hidden %>%
  layer_dense(units = num_components,
              name = 'mu')  # Means of the Gaussians

sigma <- hidden %>%
  layer_dense(units = num_components,
              activation = 'softplus',
              name = 'sigma')  # Standard deviation of the Gaussians (positive)

p <- hidden %>%
  layer_dense(units = num_components,
              activation = 'softmax',
              name = 'p')  # Mixture coefficients (sum to 1)

model <-
  keras_model(inputs = input,
              outputs = layer_concatenate(mu, sigma, p))

op_vectorized_map_debug <- function(elements, fn) {

  batch_size <- elements[[1]] |> op_shape() |> _[[1]]

  elements |>
    lapply(\(e) op_split(e, batch_size)) |>
    zip_lists() |>
    lapply(fn) |>
    op_stack()

}


ii <- 0L
custom_loss_fn <- function(y_true, y_pred) {
  ii <<- ii + 1L
  str(keras3:::named_list(ii, y_true, y_pred))
  ## browser() is safe to use here to be able to work with the `y_true` and
  ## `y_pred` tracing tensors interactively. Just be sure to exit the browser
  ## context by pressing "Continue" (to raise an error) than by "Quit". If you
  ## "Quit" the R browser context, it leaves the TensorFlow tracing context
  ## open, and nothing else will work as expected (and it will eventually
  ## segfault).

  if(py_bool(op_any(op_isnan(y_pred)))) browser()
  c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)

  sigma %<>% `+`(config_epsilon())

  single_gaussian_nll <- function(.x) {
    c(y, mu, sigma, p) %<-% .x
    result <- -op_log(op_sum(op_exp(
      op_log(p) +
        (
          -op_log(sigma) - op_log(op_sqrt(2 * pi)) - (1 / 2)
          * ((y - mu) ^ 2 / sigma ^ 2)
        )
    )))
    if(py_bool(op_isinf(result))) str(c(.x, result = result))
    result
  }

  total_nll <-
    op_sum(op_vectorized_map_debug(list(y_true, mu, sigma, p),
                                   single_gaussian_nll))

  if(py_bool(op_any(op_isnan(total_nll)))) browser()
  if(py_bool(op_any(op_isinf(total_nll)))) browser()

  str(keras3:::named_list(ii, total_nll))
  print(total_nll)

  total_nll

}

model |> compile(run_eagerly = TRUE,
                 loss = custom_loss_fn)



#data simulation
theta_alpha <- -10
theta_beta <- 10
alpha <- 1 / 4
sigma_1 <- 1
sigma_2 <- 0.1
n <- 10000 #number of samples from prior distribution

theta_prior <- runif(n, min = theta_alpha, max = theta_beta)
x_simulated <- matrix(nrow = n, ncol = 100)

for (i in 1:n) {
  for (j in 1:100) {
    indic <- rbinom(1, 1, alpha)
    x_simulated[i, j] <-
      indic * rnorm(1, mean = theta_prior[i], sd = sigma_1) +
      (1 - indic) * rnorm(1, mean = -theta_prior[i], sd = sigma_2)
  }
}

model %>% fit(x_simulated,
              matrix(theta_prior, ncol = 1),
              epochs = 10,
              batch_size = 100)

Answer 7 · 2024-04-03T19:32:53.000Z

Many thanks for your prompt reply. I will continue to debug under your generous help.

Answer 8 · 2024-04-05T02:36:42.000Z

It works. Thanks a lot.