keras-team/keras-nlp

keras-nlp insists I use the (buggy) Tensorflow 2.16.1 which does not work with my GPU

Closed this issue · 12 comments

Describe the bug

The latest Tensorflow 2.16.1 has a bug, it doesn't seem to detect GPUs (see for example: http://127.0.0.1:8888/lab?token=59ba515252bb7306a955696efe83ad0b816e730b847fac69

To get around that, I ran pip install tensorflow[and-cuda]==2.15.1

It worked and my GPU was detected.

The problem is when I pip install keras-nlp, it tries to uninstall tensorboard, tensorflow etc. to install their latest versions. I suspect keras does the same too.

I tried pip install keras-nlp --no-deps and got errors during imports (such as ModuleNotFoundError: No module named 'keras_core' (The preinstalled keras version was 2.15.0)

I tried pip uninstall keras keras-nlp then pip install keras==3.0.0 keras-nlp==0.6.3 --no-deps and I got import errors again, such as ModuleNotFoundError: No module named 'rich'

To Reproduce

Here's a colab link. The aspect that's not reproduced is tensorflow not recognizing the GPU. The GPU issue seems to be restricted to PCs.

Expected behavior

I would like to be able to install functional keras and keras-nlp packages with Tf 2.15.1

Additional context

PC windows 11 with WSL2 Ubuntu

nvidi-smi

Thu Mar 21 16:03:26 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.120                Driver Version: 537.58       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:01:00.0  On |                  Off |
|  0%   33C    P8              16W / 450W |   2895MiB / 24564MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I was able to detect GPU in the colab GPU vm.

I followed the below list of commands.
Create a fresh environment and try.

!pip install -U keras-nlp
!pip install -U tensorflow

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

If you are still unable to detect GPU, you can close this issue and create a new issue in the TensorFlow repo since it is related to TensorFlow.

You link is localhost runtime, we can't access it.

I can see Num GPUs Available: 1 in your colab, what is the issue again?

Got it, Looks like this is the TensorFlow issue for the specific OS.
You can create a new issue in TensorFlow and link this issue for context.

I think it would be nice if keras-nlp was usable by older tensorflow versions, as 2.16.1 version has several bugs in it, and keras-nlp seems to be compatible with tf 2.15

We always try to match the latest TensorFlow version during the time of release, it's the same practice we follow for Keras-Cv as well.
Moreover, TensorFlow 2.16.1 uses Keras 3 as a backend unlike 2.15 version which uses Keras 2 as a backend.

@arsenstonelab thanks for the issue. This is indeed a bit of a rough edge. The issue is actually with tensorflow-text most likely. keras-nlp is unopinionated about tensorflow versions in our package setup, but if you install keras-nlp it will try to install tensorflow-text (the latest version if none is installed). Which in turn will try to install the latest tensorflow version. Which can lead to a big upgrade of tensorflow.

One option is to pin the tf version you want during install for both tensorflow-text and tensorflow. E.g. this works for installing keras-nlp with tf 2.15.
pip install keras-nlp tensorflow-text~=2.15.0 tensorflow~=2.15.0

Does that work for you?

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.