rstudio/tensorflow

Tensorflow: Installation partially successfull

Closed this issue · 7 comments

We installed TensorFlow on a MacBook using the following code:

install.packages("remotes")
remotes::install_github(sprintf("rstudio/%s", c("reticulate", "tensorflow", "keras")))
reticulate::miniconda_uninstall() # start with a blank slate
reticulate::install_miniconda()
keras::install_keras()

The installation was successful.
Most functions run, but when we try to fit a model, it returns the following error message:

> fit(BOX.OFFICE, TRAIN.X , TRAIN.Y , epochs=500) 

2023-03-06 13:43:14.083923: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz

Epoch 1/500

2023-03-06 13:43:14.304465: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.

2023-03-06 13:43:15.308786: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0

2023-03-06 13:43:15.308813: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0

2023-03-06 13:43:15.515355: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0

2023-03-06 13:43:15.515380: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0

2023-03-06 13:43:15.582866: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0

2023-03-06 13:43:15.582893: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0

Error: tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:

<... omitted ...>ite-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients

return tf.__internal__.distribute.interim.maybe_merge_call(

File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn

distribution.extended.update(

File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var

return self._update_step_xla(grad, var, id(self._var_key(var)))

Node: 'StatefulPartitionedCall_4'

could not find registered platform with id: 0x1682845f0

[[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_874]

See `reticulate::py_last_error()` for details

Below is diagnostics information:

> reticulate::py_config()

python:         /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/bin/python

libpython:      /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/libpython3.8.dylib

pythonhome:     /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate:/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate

version:        3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:13)  [Clang 14.0.6 ]

numpy:          /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/numpy

numpy_version:  1.23.2

> tensorflow::tf_config()

TensorFlow v2.11.0 ()

Python v3.8 (~/Library/r-miniconda-arm64/envs/r-reticulate/bin/python)

> reticulate::import("tensorflow")

Module(tensorflow)

> reticulate::py_last_error()

Traceback (most recent call last):

  File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler

    raise e.with_traceback(filtered_tb) from None

  File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute

    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:

 

Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last):

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler

      return fn(*args, **kwargs)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit

      tmp_logs = self.train_function(iterator)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function

      return step_function(self, iterator)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function

      outputs = model.distribute_strategy.run(run_step, args=(data,))

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step

      outputs = model.train_step(data)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1027, in train_step

      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize

      self.apply_gradients(grads_and_vars)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients

      return super().apply_gradients(grads_and_vars, name=name)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients

      iteration = self._internal_apply_gradients(grads_and_vars)

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients

      return tf.__internal__.distribute.interim.maybe_merge_call(

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn

      distribution.extended.update(

    File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var

      return self._update_step_xla(grad, var, id(self._var_key(var)))

Node: 'StatefulPartitionedCall_4'

could not find registered platform with id: 0x11f9220b0

            [[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_874]

 

> sessionInfo()

R version 4.2.2 (2022-10-31)

Platform: aarch64-apple-darwin20 (64-bit)

Running under: macOS Monterey 12.6.2

 

Matrix products: default

LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

 

Random number generation:

RNG:     Mersenne-Twister

 Normal:  Inversion

 Sample:  Rounding

 

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base    

 

other attached packages:

[1] tensorflow_2.11.0.9000 keras_2.11.0.9000    

 

loaded via a namespace (and not attached):

[1] Rcpp_1.0.10     here_1.0.1      lattice_0.20-45 png_0.1-8       withr_2.5.0     rprojroot_2.0.3

[7] zeallot_0.1.0   grid_4.2.2      R6_2.5.1        jsonlite_1.8.4  magrittr_2.0.3  tfruns_1.5.1  

[13] rlang_1.0.6     cli_3.6.0       rstudioapi_0.14 whisker_0.4.1   Matrix_1.5-1    reticulate_1.28

[19] generics_0.1.3  tools_4.2.2     compiler_4.2.2  base64enc_0.1-3

Hi, thanks for reporting. I don't think this is an installation issue, but rather, you're running into limitations of TensorFlow on the M1 Macs. You can try pinning your fit() call to the CPU to confirm.

https://tensorflow.rstudio.com/reference/tensorflow/install_tensorflow.html#apple-silicon

Hello,
We tried the following and it still produced an error message. We would be grateful for your help:

> **with(tf$device("CPU"),fit(BOX.OFFICE, TRAIN.X , TRAIN.Y , epochs=500))**

Epoch 1/500

2023-03-07 13:40:33.640811: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.

2023-03-07 13:40:33.705613: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-33-at-0x296625600 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

2023-03-07 13:40:33.705626: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-35-at-0x296627420 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

2023-03-07 13:40:33.705642: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-34-at-0x292e9e300 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

2023-03-07 13:40:33.705694: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-32-at-0x297517320 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

2023-03-07 13:40:33.705728: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-31-at-0x2970ffe30 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

2023-03-07 13:40:33.705743: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-30-at-0x294b5c830 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

Error: tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

<... omitted ...>e_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

[[{{node StatefulPartitionedCall_3}}]]

[[div_no_nan/_23]]

(1) INVALID_ARGUMENT: Trying to access resource Resource-33-at-0x296625600 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0

Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

[[{{node StatefulPartitionedCall_3}}]]

0 successful operations.

0 derived errors ignored. [Op:__inference_train_function_1747]

See `reticulate::py_last_error()` for details

The link in the error message points to the resolution:

https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device

tf.Variable on a different device
Error message: INVALID_ARGUMENT: Trying to access resource (defined @ ) located in device CPU:0 from device GPU:0
XLA cluster runs on exactly one device, and it can not read or write to tf.Variable located on a different device. Usually this error message indicates that the variable was not placed on the right device to begin with. The error message should precisely specify the location of the offending variable.

Can you please try pinning the full model definition (including the input tensor creation) to the CPU, (not just the fit() call)?

Thank you for the generous offer Kıvanç Avrenli, but I don't think we'll need that.

You can wrap your entire script with an expression like:

with(tf$device("CPU"), source("script-that-creates-and-fits-the-model.R"))

Another approach would be to remove the M1 GPU from the visible devices list before creating the model, like this:

tf$config$get_visible_devices("CPU") |> 
  tf$config$set_visible_devices()

Thank you very much, but again, no payment is necessary or expected :)