keras-team/autokeras

Bug: libdevice not found at ./libdevice / doesn't run on GPU

JuliaWasala opened this issue · 3 comments

Bug Description

Bug Reproduction

Code for reproducing the bug:
Installation:

conda create --name env python=3.9.13
conda activate env
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
pip install tensorflow
conda install jupyter
pip install autokeras
pip install tqdm
pip install scikit-learn
pip install tensorflow-datasets

Code:

import os
import sys

sys.path.append("/home/julia/dir/src")
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import logging
import contextlib

from tqdm import tqdm
import tensorflow_datasets as tfds
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

from automl.vanilla_autokeras import train_and_test_autokeras

if __name__ == "__main__":
    os.system("clear")
    SVM_DATA_PATH="/data/julia/data/svm_data"
    RESULTS_PATH = "/data/julia/results"
    EXPERIMENT = "autokeras"

    config = ConfigProto()
    config.gpu_options.allow_growth = True
    session = InteractiveSession(config=config)

    logging.basicConfig(
        filename=f"/data/julia/logs/{EXPERIMENT}.log", level=logging.WARNING
    )
    logging.captureWarnings(True)

    with open(os.path.join(RESULTS_PATH, EXPERIMENT + ".csv"), "w") as f:
        f.write("index,acc,precision,recall,f1\n")

    pbar = tqdm(range(10), leave=True, position=0)
    for i in pbar:
        print("#############")
        print("# Autokeras #")
        print("#############")
        
        print("Get data")
        train=tfds.as_numpy(tfds.load("data", split="train", data_dir="/data/julia/data/tfds",as_supervised=False,batch_size=-1))
        test=tfds.as_numpy(tfds.load("data", split="test", data_dir="/data/julia/data/tfds",as_supervised=False,batch_size=-1))
        val=tfds.as_numpy(tfds.load("data", split="val", data_dir="/data/julia/data/tfds",as_supervised=False,batch_size=-1))

        print("Train model")
        acc,precision,recall,f1 = train_and_test_autokeras(train["image"],test["image"],val["image"],train["label"],test["label"],val["label"],f"autokeras_{i}")

        with open(os.path.join(RESULTS_PATH, EXPERIMENT + ".csv"), "a") as f:
            f.write(
                ",".join(
                    str(res) for res in [i, acc, precision, recall, f1]
                )
                + "\n"
            )

        if i != 9:
            os.system("clear")

With the ImageClassifier training/testing code:

clf =ak.ImageClassifier(overwrite=True,project_name=name,directory="/data/julia/models/")
clf.fit(X_train,y_train,validation_data=(X_val,y_val))
predicted_y = clf.predict(X_test)

# get acc, precision, recall, f1
acc = accuracy_score(y_test, predicted_y)
precision = precision_score(y_test, predicted_y,average='macro')
recall = recall_score(y_test, predicted_y,average='macro')
f1 = f1_score(y_test, predicted_y,average='macro')

Expected Behavior

Setup Details

Include the details about the versions of:

  • OS type and version: Debian 5.10.149-2
  • Python: 3.9.13
  • autokeras: 1.0.20
  • keras-tuner:1.1.3
  • scikit-learn:1.2.0
  • numpy:1.19.5
  • pandas:1.4.4
  • tensorflow: 2.11

It doesn't look like a AutoKeras issue, but a local setup issue.
Can you run TF successfully on GPU with your setup?

For anyone encounter this problem, you may follow the following instructions to install TensorFlow.

conda create --name=test python=3.10 -y
conda run -n test conda install cuda-nvcc=11.3.58 cudatoolkit=11.2.2 cudnn=8.1.0 -c conda-forge -c nvidia -y
conda run -n test pip install tensorflow
conda run -n test mkdir -p $CONDA_PREFIX/etc/conda/activate.d
conda run -n test printf 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/\nexport XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
conda run -n test mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
conda run -n test cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/