rstudio/keras3

Difference between R and Python output

SantiagoD999 opened this issue · 7 comments

Good morning,

I am using the excellent keras3 package in R and comparing it to the results I get using python I noticed some difference even though I am setting the same random seed and neural network architecture. The version that I am using of keras is 3.3.3 and of tensorflow is 2.16.1

library(reticulate)
library(keras3)

py_code<-"
from sklearn.model_selection import train_test_split
import numpy as np
from keras import *

np.random.seed(42)

n = 500
reg = 3
relevant_reg=3

betas = np.random.normal(loc=3, scale=2, size=(relevant_reg, 1))
X = np.random.normal(size=(n, reg))
y = X[:, :relevant_reg] @ betas + np.random.normal(size=(n,1),loc=0,scale=1)

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

utils.set_random_seed(1)
model_NN1 = Sequential([
  Input(shape=(X.shape[1],)),
  layers.Dense(100, activation='relu'),
  layers.Dense(1) 
])

model_NN1 .compile(loss='mean_squared_error', optimizer='adam')

model_NN1 .fit(x_train, y_train, epochs=100,vebose=0)

model_NN1_eval = model_NN1.evaluate(x_test, y_test)"

results_py<-py_run_string(py_code)
results_py$model_NN1_eval

x_train<-py$x_train
x_test<-py$x_test
y_test<-py$y_test
y_train<-py$y_train

x_train <- array_reshape(x_train, c(nrow(x_train), ncol(x_train)))
x_test <- array_reshape(x_test, c(nrow(x_test), ncol(x_test)))

keras3::set_random_seed(1)

model <- keras_model_sequential(input_shape = NCOL(x_train))
model |>
  layer_dense(units =100, activation = 'relu') |>
  layer_dense(units = 1)

model |> compile(
  loss = 'mean_squared_error',
  optimizer = "adam",
)

model |> fit(
  x_train, y_train,
  epochs = 100,verbose=0)

results_r<-model %>% evaluate(x_test, y_test)

results_py$model_NN1_eval
results_r$loss

Does anyone know why results_py$model_NN1_eval and results_r$loss are not the same?

Thank you very much.

Thanks for reporting. This took a few head scratches to track down, but it ended up being really simple.

The order in which the layers are instantiated is different in R because |> does not eagerly evaluate the left hand side argument. Instead, the argument, (i.e., the expression that creates the layer) is not evaluated until the last call in the chain of arguments attempts to access it, which is after that last call has already created the layer.

We could change this behavior in keras3. (It's not an issue with %>%, only |>)

By the way, do you know keras has keras3::split_dataset()?

Here is your MRE reorganized while I was tracked down the difference.
The final result is identical now.

library(reticulate)
library(keras3)

# ---- make data ----
py_run_string(r"---(

from sklearn.model_selection import train_test_split
import numpy as np
from keras import *

np.random.seed(42)

n = 500
reg = 3
relevant_reg = 3

betas = np.random.normal(loc=3, scale=2, size=(relevant_reg, 1))
X = np.random.normal(size=(n, reg))
y = X[:, :relevant_reg] @ betas + np.random.normal(size=(n, 1), loc=0, scale=1)

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

)---")

x_train <- py_eval("x_train", convert = FALSE)
x_test  <- py_eval("x_test", convert = FALSE)
y_test  <- py_eval("y_test", convert = FALSE)
y_train <- py_eval("y_train", convert = FALSE)


# ---- model makers ----
train_py_model <- function() {

  py_run_string(r"---(
utils.clear_session()
utils.set_random_seed(1)

model = Sequential([Input(shape=(3,))])
model.add(layers.Dense(100, activation="relu"))
model.add(layers.Dense(1))

model.compile(loss="mean_squared_error", optimizer="adam")

model.fit(x_train, y_train, epochs=100, verbose=0)

result = model.evaluate(x_test, y_test, return_dict = True)['loss']
)---")$result

}


train_r_model <- function() {
  evalq(envir = globalenv(), {
    clear_session()
    set_random_seed(1)

    model <- keras_model_sequential(3) # shape(x_train)[[2]])
    model |> layer_dense(100, activation = 'relu')
    model |> layer_dense(1)
    # model |>
    #   layer_dense(100, activation = 'relu') |>
    #   layer_dense(1)

    model |> compile(loss = "mean_squared_error", optimizer = "adam")

    model |> fit(x_train, y_train, epochs = 100, verbose = 0)

    result <- model |> evaluate(x_test, y_test) |> _$loss
    model$evaluate
    result
  })
}

print(train_py_model())
print(train_r_model())

waldo::compare(serialize_keras_object(py$model),
               serialize_keras_object(model))

The behavior of layer_ functions within |> chains is changed in main now.

These two snippets produce identical models now, creating the underlying layers in the same order:

model <- keras_model_sequential(3) 
model |> layer_dense(100, activation = 'relu')
model |> layer_dense(1)
model <-  keras_model_sequential(3) |> 
  layer_dense(100, activation = 'relu') |>
  layer_dense(1)

Thanks for reporting!

Thank you very much for your reply. I tried running

model <- keras_model_sequential(3) 
model |> layer_dense(100, activation = 'relu')
model |> layer_dense(1)

and

model <- keras_model_sequential(3) 
model |> layer_dense(100, activation = 'relu')
model |> layer_dense(1)

In the context of

library(reticulate)
library(keras3)

# ---- make data ----
py_run_string(r"---(

from sklearn.model_selection import train_test_split
import numpy as np
from keras import *

np.random.seed(42)

n = 500
reg = 3
relevant_reg = 3

betas = np.random.normal(loc=3, scale=2, size=(relevant_reg, 1))
X = np.random.normal(size=(n, reg))
y = X[:, :relevant_reg] @ betas + np.random.normal(size=(n, 1), loc=0, scale=1)

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

)---")

x_train <- py_eval("x_train", convert = FALSE)
x_test  <- py_eval("x_test", convert = FALSE)
y_test  <- py_eval("y_test", convert = FALSE)
y_train <- py_eval("y_train", convert = FALSE)


# ---- model makers ----
train_py_model <- function() {
  
  py_run_string(r"---(
utils.clear_session()
utils.set_random_seed(1)

model = Sequential([Input(shape=(3,))])
model.add(layers.Dense(100, activation="relu"))
model.add(layers.Dense(1))

model.compile(loss="mean_squared_error", optimizer="adam")

model.fit(x_train, y_train, epochs=100, verbose=0)

result = model.evaluate(x_test, y_test, return_dict = True)['loss']
)---")$result
  
}


train_r_1_model <- function() {
    clear_session()
    set_random_seed(1)
    
    model <- keras_model_sequential(3) |> 
      layer_dense(100, activation = 'relu') %>%
      layer_dense(1)
    
    model %>% compile(loss = "mean_squared_error", optimizer = "adam")
    
    model %>% fit(x_train, y_train, epochs = 100, verbose = 0)
    
    result <- model %>% evaluate(x_test, y_test)
    model$evaluate
    result
}
train_r_2_model <- function() {
  clear_session()
  set_random_seed(1)
  
  model <- keras_model_sequential(3)
  model %>% layer_dense(100, activation = 'relu') 
  model %>%layer_dense(1)
  
  model %>% compile(loss = "mean_squared_error", optimizer = "adam")
  
  model %>% fit(x_train, y_train, epochs = 100, verbose = 0)
  
  result <- model %>% evaluate(x_test, y_test)
  model$evaluate
  result
}

print(train_py_model())
print(train_r_1_model())
print(train_r_2_model())

But the functions train_r_1_model() and train_r_2_model() do not give the same results, even though train_py_model() and train_r_2_model() are giving the same results. I also noticed that using %>% or |> does not change the results.

Did you update to use the development version of keras3?

remotes::install_github("rstudio/keras")

When I run the code in your last reply, I get identical loss values each time (1.095762).

If you haven't updated yet, then it's probably the stray |> remaining in train_r_1_model() that's leading to the difference.

Thank you very much for your reply. When trying to install the development version I get the following error:

Using GitHub PAT from the git credential store.
Error: Failed to install 'keras' from GitHub:
  HTTP error 401.
  Bad credentials

I changed the |> for %>%, but the train_r_1_model() is producing a different result, do you know why this may be happening?

train_r_1_model <- function() {
  clear_session()
  set_random_seed(1)
  
  model <- keras_model_sequential(3) %>%
    layer_dense(100, activation = 'relu') %>%
    layer_dense(1)
  
  model %>% compile(loss = "mean_squared_error", optimizer = "adam")
  
  model %>% fit(x_train, y_train, epochs = 100, verbose = 0)
  
  result <- model %>% evaluate(x_test, y_test)
  model$evaluate
  result
}


'print(train_r_1_model())'

 1.081783

If you're unable to install the dev version, then in the interim, you'll need to avoid long pipe chains and do something like this:

model <- keras_model_sequential(3)
model |> layer_dense(100, activation = 'relu') 
model |> layer_dense(1)

However, I would recommend fixing your setup so you can install the development version via remotes::install_github(). It should work by default. My guess is that you have an expired token in your git credential store.

?usethis::create_github_token is a great place to start troubleshooting this.

Thank you very much for your reply.