rstudio/keras3

list_physical_devices('GPU') returns list() (in R studio server with Ubuntu 20.04 WLS2)

JiaoLongXiao opened this issue · 0 comments

This problem is really driving me crazy; I have to seek help here.
I have tried Plan A , Plan B and plan C, all failed.

Nvidia Geforce RTX 3060 Laptop GPU,
cuda toolkit version = 11.7
cudnn version = 8.9.7
Ubuntu 20.04.6

Plan A (Failed) :

In the Ubuntu terminal, I set up a virtual environment with Python 3.8.10, installed TensorFlow (2.13) and its dependencies, and ran:

import tensorflow as tf
tf.config.list_physical_devices('GPU')

It successfully detected the GPU.

However, when I logged into RStudio Server and ran:

library(reticulate)
use_virtualenv( <same virtual environment as above>, required = TRUE)
library(tensorflow)
tf$config$list_physical_devices('GPU')

It returned an empty list (list()).

Plan B (Failed):

In a clean Ubuntu 20.04 System, i logged into RStudio Server and ran:

remotes::install_github("rstudio/tensorflow")
install_tensorflow()    
use_virtualenv( <~/virtualenvs/r-tensorflow>, required = TRUE)   # version = 2.13  no error in this installation and tf$contant(...) worked

library(tensorflow)
tf$config$list_physical_devices('GPU')

It returned an empty list (list()).

Then , In Ubuntu terminal I use the same virtual environment (~/virtualenvs/r-tensorflow) and ran :

import tensorflow as tf
tf.config.list_physical_devices('GPU')

It successfully detected the GPU.

Plan C: (failed)

in a clean R studio server :

install.packages("remotes")
remotes::install_github("rstudio/keras3")  #keras3 version = 1.1.0.9000
keras3::install_keras(gpu = TRUE)   # tensorflow version = 2.17.0

library(reticulate)
use_virtualenv("~/.virtualenvs/r-keras", required = TRUE)
library(tensorflow)
tf$config$list_physical_devices('GPU')

It returned an empty list (list()). with text below:

2024-08-26 12:09:27.039607: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-08-26 12:09:27.051218: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-26 12:09:27.064713: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-26 12:09:27.068189: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-26 12:09:27.077083: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-26 12:09:27.774988: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-08-26 12:09:29.146018: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:266] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

in Ubuntu terminal , ran:

source ~/.virtualenvs/r-keras/bin/activate
import tensorflow as tf
tf.config.list_physical_devices('GPU')

it showed :

2024-08-26 12:10:21.785161: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-08-26 12:10:21.807707: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-26 12:10:21.821581: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-26 12:10:21.835387: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-26 12:10:21.849394: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-26 12:10:22.885415: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1724645633.331901 4600 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724645633.421905 4600 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1724645633.422396 4600 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

it found GPU again.

nvidia-smi as follolws :

Mon Aug 26 09:37:05 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 560.94 CUDA Version: 12.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 47C P8 12W / 110W | 1062MiB / 6144MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+