Tensorflow: Installation partially successfull
Closed this issue · 7 comments
We installed TensorFlow on a MacBook using the following code:
install.packages("remotes")
remotes::install_github(sprintf("rstudio/%s", c("reticulate", "tensorflow", "keras")))
reticulate::miniconda_uninstall() # start with a blank slate
reticulate::install_miniconda()
keras::install_keras()
The installation was successful.
Most functions run, but when we try to fit a model, it returns the following error message:
> fit(BOX.OFFICE, TRAIN.X , TRAIN.Y , epochs=500)
2023-03-06 13:43:14.083923: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Epoch 1/500
2023-03-06 13:43:14.304465: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-03-06 13:43:15.308786: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0
2023-03-06 13:43:15.308813: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0
2023-03-06 13:43:15.515355: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0
2023-03-06 13:43:15.515380: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0
2023-03-06 13:43:15.582866: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0
2023-03-06 13:43:15.582893: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x1682845f0
Error: tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:
<... omitted ...>ite-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_4'
could not find registered platform with id: 0x1682845f0
[[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_874]
See `reticulate::py_last_error()` for details
Below is diagnostics information:
> reticulate::py_config()
python: /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/bin/python
libpython: /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/libpython3.8.dylib
pythonhome: /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate:/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate
version: 3.8.16 | packaged by conda-forge | (default, Feb 1 2023, 16:01:13) [Clang 14.0.6 ]
numpy: /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/numpy
numpy_version: 1.23.2
> tensorflow::tf_config()
TensorFlow v2.11.0 ()
Python v3.8 (~/Library/r-miniconda-arm64/envs/r-reticulate/bin/python)
> reticulate::import("tensorflow")
Module(tensorflow)
> reticulate::py_last_error()
Traceback (most recent call last):
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.NotFoundError: Graph execution error:
Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last):
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
iteration = self._internal_apply_gradients(grads_and_vars)
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_4'
could not find registered platform with id: 0x11f9220b0
[[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_874]
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.2
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tensorflow_2.11.0.9000 keras_2.11.0.9000
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 here_1.0.1 lattice_0.20-45 png_0.1-8 withr_2.5.0 rprojroot_2.0.3
[7] zeallot_0.1.0 grid_4.2.2 R6_2.5.1 jsonlite_1.8.4 magrittr_2.0.3 tfruns_1.5.1
[13] rlang_1.0.6 cli_3.6.0 rstudioapi_0.14 whisker_0.4.1 Matrix_1.5-1 reticulate_1.28
[19] generics_0.1.3 tools_4.2.2 compiler_4.2.2 base64enc_0.1-3
Hi, thanks for reporting. I don't think this is an installation issue, but rather, you're running into limitations of TensorFlow on the M1 Macs. You can try pinning your fit()
call to the CPU to confirm.
https://tensorflow.rstudio.com/reference/tensorflow/install_tensorflow.html#apple-silicon
Hello,
We tried the following and it still produced an error message. We would be grateful for your help:
> **with(tf$device("CPU"),fit(BOX.OFFICE, TRAIN.X , TRAIN.Y , epochs=500))**
Epoch 1/500
2023-03-07 13:40:33.640811: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-03-07 13:40:33.705613: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-33-at-0x296625600 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-03-07 13:40:33.705626: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-35-at-0x296627420 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-03-07 13:40:33.705642: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-34-at-0x292e9e300 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-03-07 13:40:33.705694: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-32-at-0x297517320 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-03-07 13:40:33.705728: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-31-at-0x2970ffe30 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
2023-03-07 13:40:33.705743: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : INVALID_ARGUMENT: Trying to access resource Resource-30-at-0x294b5c830 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
Error: tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
<... omitted ...>e_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
[[{{node StatefulPartitionedCall_3}}]]
[[div_no_nan/_23]]
(1) INVALID_ARGUMENT: Trying to access resource Resource-33-at-0x296625600 (defined @ /Users/yinuoyang/Library/r-miniconda-arm64/envs/r-reticulate/lib/python3.8/site-packages/keras/engine/base_layer_utils.py:134) located in device /job:localhost/replica:0/task:0/device:GPU:0 from device /job:localhost/replica:0/task:0/device:CPU:0
Cf. https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
[[{{node StatefulPartitionedCall_3}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_1747]
See `reticulate::py_last_error()` for details
The link in the error message points to the resolution:
https://www.tensorflow.org/xla/known_issues#tfvariable_on_a_different_device
tf.Variable on a different device
Error message: INVALID_ARGUMENT: Trying to access resource (defined @ ) located in device CPU:0 from device GPU:0
XLA cluster runs on exactly one device, and it can not read or write to tf.Variable located on a different device. Usually this error message indicates that the variable was not placed on the right device to begin with. The error message should precisely specify the location of the offending variable.
Can you please try pinning the full model definition (including the input tensor creation) to the CPU, (not just the fit()
call)?
Thank you for the generous offer Kıvanç Avrenli, but I don't think we'll need that.
You can wrap your entire script with an expression like:
with(tf$device("CPU"), source("script-that-creates-and-fits-the-model.R"))
Another approach would be to remove the M1 GPU from the visible devices list before creating the model, like this:
tf$config$get_visible_devices("CPU") |>
tf$config$set_visible_devices()
Thank you very much, but again, no payment is necessary or expected :)