RuntimeError: invalid 'type' (environment) of argument
rjpeng98 opened this issue · 8 comments
He, I am using keras and tensorflow in R to train a mixture density network. My customized loss function has been tested.
However, when I try to fit the model, there is always an error:
My codes are following here:
library(keras)
library(tensorflow)
num_components = 2 # Number of mixture components
input <- layer_input(shape = c(100)) # a 1-dimensional input
# Define a hidden layer
hidden <- input %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 64, activation = 'relu')
# Output layers for mixture components
mu <- hidden %>%
layer_dense(units = num_components, name = 'mu') # Means of the Gaussians
sigma <- hidden %>%
layer_dense(units = num_components, activation = 'softplus', name = 'sigma') # Standard deviation of the Gaussians (positive)
p <- hidden %>%
layer_dense(units = num_components, activation = 'softmax', name = 'p') # Mixture coefficients (sum to 1)
model <- keras_model(inputs = input, outputs = list(mu, sigma, p))
mdn_loss <- function(y_true, model_output) {
# Extract components from model output
mu = model_output[[1]]
sigma = model_output[[2]] + keras::k_epsilon()
p = model_output[[3]]
single_gaussian_nll = function(y, mu, sigma, p) {
return(-log(sum(exp(log(p)+
(-log(sigma)-log(2*pi)/2-1/2*((y-mu)/sigma)^2)))))
}
total_nll <- 0
for (i in 1:nrow(y_true)) {
total_nll = sum(total_nll, (single_gaussian_nll(y_true[i], mu, sigma, p)))
}
return((total_nll))
}
model %>% compile(
optimizer = 'adam',
loss = mdn_loss)
#data simulation
theta_alpha<- -10
theta_beta<- 10
alpha<- 1/4
sigma_1<- 1
sigma_2<- 0.1
n<-10000 #number of samples from prior distribution
theta_prior<- runif(n, min = theta_alpha, max = theta_beta)
x_simulated<- matrix(nrow = n,ncol = 100)
for (i in 1:n) {
for (j in 1:100) {
indic<- rbinom(1,1,alpha)
x_simulated[i,j]<- indic*rnorm(1,mean = theta_prior[i], sd = sigma_1)+
(1-indic)*rnorm(1, mean = -theta_prior[i], sd = sigma_2)
}
}
model %>% fit(x_simulated, matrix(theta_prior, ncol = 1), epochs = 10, batch_size = 100)
Thanks in advance for your any comments.
Using the functional API in Keras to train a multi-output model, the value supplied to fit(loss = )
is expected to be a list of callables, with the same length as model outputs. Each output will be called with only one output. This works great if the loss from each output can be calculated without the values of the other outputs.
However, this API does not cover the use case when you have multiple outputs, and you need the values of all the outputs in one scope to calculate the loss. There are however two straightforward ways to do this:
-
If all the outputs have compatible shapes, you can call
layer_concatenate()
to combine the outputs along an axis, and then unstack them in the custom loss function.Here is your code adapted using this approach. Note, I updated it to use
keras3
instead ofkeras
, and I used the newop_*
functions where appropriate. (e.g., replaced thefor
loop withop_vectorized_map()
:model <- keras_model(inputs = input, outputs = layer_concatenate(mu, sigma, p)) custom_loss_fn <- function(y_true, y_pred) { str(y_true) str(y_pred) ## browser() is safe to use here to be able to work with the `y_true` and ## `y_pred` tracing tensors interactively. Just be sure to exit the browser ## context by pressing "Continue" (to raise an error) than by "Quit". If you ## "Quit" the R browser context, it leaves the TensorFlow tracing context ## open, and nothing else will work as expected (and it will eventually ## segfault). # browser() c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2) sigma %<>% `+`(config_epsilon()) single_gaussian_nll <- function(.x) { c(y, mu, sigma, p) %<-% .x -log(sum(exp( log(p) + (-log(sigma) - log(2 * pi) / 2 - 1 / 2 * ((y - mu) / sigma) ^ 2) ))) } total_nll <- op_sum(op_vectorized_map(list(y_true, mu, sigma, p), single_gaussian_nll)) total_nll }
-
You can subclass
Model
and define a customtrain_step
.
See https://keras.posit.co/articles/custom_train_step_in_tensorflow.html for examples.Note, I don't think this example requires a custom
train_step
, that would only be required if the outputs did not share a shape and could not be concatenated. If you still want to have a model with 3 outputs, you can define two models that share weights, one for training, and one for inference. E.g.,model <- ... # same as before training_model <- keras_model(inputs = model$inputs, outputs = layer_concatenate(model$outputs)) # Training 'training_model' will also train 'model', since the two # models share weights. training_model |> compile() |> fit() # same as before model |> predict() # inference from trained model with 3 outputs
You can pass compile(run_eagerly = TRUE)
, and then insert browser()
, print()
, str()
, and message()
calls in your custom loss function as needed to track down where the NaN's are coming from. E.g.,
custom_loss_fn <- function(y_true, y_pred) {
... # same as before
single_gaussian_nll <- function(.x) {
c(y, mu, sigma, p) %<-% .x
result <- ... # calculate as before
if (py_bool(op_isnan(result))) browser()
result
}
total_nll <- ... # same as before
if (py_bool(op_isnan(total_nll))) browser()
total_nll
}
model |> compile(run_eagerly = TRUE, loss = custom_loss_fn)
Many thanks.
Greetings
I tried to debug my codes by adding "if (py_bool(op_isnan(result))) browser()" in "custom_loss_fn".
However, there is an error related to py_bool as
I appreciate your any comments as usual. Btw, by browser(), it looks like the y_pred is NA after the very first iteration. Does it mean my network is wrong?
The codes are as following:
#install_keras()
library(keras3)
library(tensorflow)
library(reticulate)
num_components = 2 # Number of mixture components
input <- layer_input(shape = c(100)) # a 1-dimensional input
# Define a hidden layer
hidden <- input %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 64, activation = 'relu')
# Output layers for mixture components
mu <- hidden %>%
layer_dense(units = num_components, name = 'mu') # Means of the Gaussians
sigma <- hidden %>%
layer_dense(units = num_components, activation = 'softplus', name = 'sigma') # Standard deviation of the Gaussians (positive)
p <- hidden %>%
layer_dense(units = num_components, activation = 'softmax', name = 'p') # Mixture coefficients (sum to 1)
model <- keras_model(inputs = input, outputs = layer_concatenate(mu, sigma, p))
custom_loss_fn <- function(y_true, y_pred) {
str(y_true)
str(y_pred)
## browser() is safe to use here to be able to work with the `y_true` and
## `y_pred` tracing tensors interactively. Just be sure to exit the browser
## context by pressing "Continue" (to raise an error) than by "Quit". If you
## "Quit" the R browser context, it leaves the TensorFlow tracing context
## open, and nothing else will work as expected (and it will eventually
## segfault).
c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)
sigma %<>% `+`(config_epsilon())
single_gaussian_nll <- function(.x) {
c(y, mu, sigma, p) %<-% .x
result<- -log(sum(exp(
log(p) +
(-log(sigma) - log(sqrt(2 * pi)) - (1 / 2) * ((y - mu)^2 /sigma^2))
)))
if (py_bool(op_isnan(result))) browser()
result
}
total_nll <-
op_sum(op_vectorized_map(list(y_true, mu, sigma, p),
single_gaussian_nll))
total_nll
}
#debug
model |> compile(run_eagerly = TRUE, loss = custom_loss_fn)
#data simulation
theta_alpha<- -10
theta_beta<- 10
alpha<- 1/4
sigma_1<- 1
sigma_2<- 0.1
n<-10000 #number of samples from prior distribution
theta_prior<- runif(n, min = theta_alpha, max = theta_beta)
x_simulated<- matrix(nrow = n,ncol = 100)
for (i in 1:n) {
for (j in 1:100) {
indic<- rbinom(1,1,alpha)
x_simulated[i,j]<- indic*rnorm(1,mean = theta_prior[i], sd = sigma_1)+
(1-indic)*rnorm(1, mean = -theta_prior[i], sd = sigma_2)
}
}
model %>% fit(x_simulated, matrix(theta_prior, ncol = 1), epochs = 10, batch_size = 100)
Many thanks.
Thanks, I could reproduce. This was slightly harder to track down than I expected, because op_vectorized_map()
traces f
even in eager mode. I will add an example to the docs of op_vectorized_map()
, showing how implement a debuggable version of it, op_vectorized_map_debug()
.
The issue is that the custom loss you are calculating returns inf
sometimes, which the optimizer then uses to updates the weights into nan
. The nan
values in y_pred
are only encountered after the first batch of updates.
Here is your code updated with inserted if(py_bool(op_any(op_isinf(total_nll)))) ...
calls, using op_vectorized_map_debug()
.
#install_keras()
Sys.setenv("CUDA_VISIBLE_DEVICES"="")
library(keras3)
# library(tensorflow, exclude = c("set_random_seed", "shape"))
library(reticulate)
num_components = 2 # Number of mixture components
input <- layer_input(shape = c(100)) # a 1-dimensional input
# Define a hidden layer
hidden <- input %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 64, activation = 'relu')
# Output layers for mixture components
mu <- hidden %>%
layer_dense(units = num_components,
name = 'mu') # Means of the Gaussians
sigma <- hidden %>%
layer_dense(units = num_components,
activation = 'softplus',
name = 'sigma') # Standard deviation of the Gaussians (positive)
p <- hidden %>%
layer_dense(units = num_components,
activation = 'softmax',
name = 'p') # Mixture coefficients (sum to 1)
model <-
keras_model(inputs = input,
outputs = layer_concatenate(mu, sigma, p))
op_vectorized_map_debug <- function(elements, fn) {
batch_size <- elements[[1]] |> op_shape() |> _[[1]]
elements |>
lapply(\(e) op_split(e, batch_size)) |>
zip_lists() |>
lapply(fn) |>
op_stack()
}
ii <- 0L
custom_loss_fn <- function(y_true, y_pred) {
ii <<- ii + 1L
str(keras3:::named_list(ii, y_true, y_pred))
## browser() is safe to use here to be able to work with the `y_true` and
## `y_pred` tracing tensors interactively. Just be sure to exit the browser
## context by pressing "Continue" (to raise an error) than by "Quit". If you
## "Quit" the R browser context, it leaves the TensorFlow tracing context
## open, and nothing else will work as expected (and it will eventually
## segfault).
if(py_bool(op_any(op_isnan(y_pred)))) browser()
c(mu, sigma, p) %<-% op_split(y_pred, 3, axis = 2)
sigma %<>% `+`(config_epsilon())
single_gaussian_nll <- function(.x) {
c(y, mu, sigma, p) %<-% .x
result <- -op_log(op_sum(op_exp(
op_log(p) +
(
-op_log(sigma) - op_log(op_sqrt(2 * pi)) - (1 / 2)
* ((y - mu) ^ 2 / sigma ^ 2)
)
)))
if(py_bool(op_isinf(result))) str(c(.x, result = result))
result
}
total_nll <-
op_sum(op_vectorized_map_debug(list(y_true, mu, sigma, p),
single_gaussian_nll))
if(py_bool(op_any(op_isnan(total_nll)))) browser()
if(py_bool(op_any(op_isinf(total_nll)))) browser()
str(keras3:::named_list(ii, total_nll))
print(total_nll)
total_nll
}
model |> compile(run_eagerly = TRUE,
loss = custom_loss_fn)
#data simulation
theta_alpha <- -10
theta_beta <- 10
alpha <- 1 / 4
sigma_1 <- 1
sigma_2 <- 0.1
n <- 10000 #number of samples from prior distribution
theta_prior <- runif(n, min = theta_alpha, max = theta_beta)
x_simulated <- matrix(nrow = n, ncol = 100)
for (i in 1:n) {
for (j in 1:100) {
indic <- rbinom(1, 1, alpha)
x_simulated[i, j] <-
indic * rnorm(1, mean = theta_prior[i], sd = sigma_1) +
(1 - indic) * rnorm(1, mean = -theta_prior[i], sd = sigma_2)
}
}
model %>% fit(x_simulated,
matrix(theta_prior, ncol = 1),
epochs = 10,
batch_size = 100)
Many thanks for your prompt reply. I will continue to debug under your generous help.
It works. Thanks a lot.