Bug: libdevice not found at ./libdevice / doesn't run on GPU
JuliaWasala opened this issue · 3 comments
JuliaWasala commented
Bug Description
Bug Reproduction
Code for reproducing the bug:
Installation:
conda create --name env python=3.9.13
conda activate env
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
pip install tensorflow
conda install jupyter
pip install autokeras
pip install tqdm
pip install scikit-learn
pip install tensorflow-datasets
Code:
import os
import sys
sys.path.append("/home/julia/dir/src")
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import logging
import contextlib
from tqdm import tqdm
import tensorflow_datasets as tfds
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
from automl.vanilla_autokeras import train_and_test_autokeras
if __name__ == "__main__":
os.system("clear")
SVM_DATA_PATH="/data/julia/data/svm_data"
RESULTS_PATH = "/data/julia/results"
EXPERIMENT = "autokeras"
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
logging.basicConfig(
filename=f"/data/julia/logs/{EXPERIMENT}.log", level=logging.WARNING
)
logging.captureWarnings(True)
with open(os.path.join(RESULTS_PATH, EXPERIMENT + ".csv"), "w") as f:
f.write("index,acc,precision,recall,f1\n")
pbar = tqdm(range(10), leave=True, position=0)
for i in pbar:
print("#############")
print("# Autokeras #")
print("#############")
print("Get data")
train=tfds.as_numpy(tfds.load("data", split="train", data_dir="/data/julia/data/tfds",as_supervised=False,batch_size=-1))
test=tfds.as_numpy(tfds.load("data", split="test", data_dir="/data/julia/data/tfds",as_supervised=False,batch_size=-1))
val=tfds.as_numpy(tfds.load("data", split="val", data_dir="/data/julia/data/tfds",as_supervised=False,batch_size=-1))
print("Train model")
acc,precision,recall,f1 = train_and_test_autokeras(train["image"],test["image"],val["image"],train["label"],test["label"],val["label"],f"autokeras_{i}")
with open(os.path.join(RESULTS_PATH, EXPERIMENT + ".csv"), "a") as f:
f.write(
",".join(
str(res) for res in [i, acc, precision, recall, f1]
)
+ "\n"
)
if i != 9:
os.system("clear")
With the ImageClassifier training/testing code:
clf =ak.ImageClassifier(overwrite=True,project_name=name,directory="/data/julia/models/")
clf.fit(X_train,y_train,validation_data=(X_val,y_val))
predicted_y = clf.predict(X_test)
# get acc, precision, recall, f1
acc = accuracy_score(y_test, predicted_y)
precision = precision_score(y_test, predicted_y,average='macro')
recall = recall_score(y_test, predicted_y,average='macro')
f1 = f1_score(y_test, predicted_y,average='macro')
Expected Behavior
Setup Details
Include the details about the versions of:
- OS type and version: Debian 5.10.149-2
- Python: 3.9.13
- autokeras: 1.0.20
- keras-tuner:1.1.3
- scikit-learn:1.2.0
- numpy:1.19.5
- pandas:1.4.4
- tensorflow: 2.11
haifeng-jin commented
It doesn't look like a AutoKeras issue, but a local setup issue.
Can you run TF successfully on GPU with your setup?
JuliaWasala commented
No, but I got it working by placing the lib in my working directory. Not a pretty fix but better than nothing.
From: Haifeng Jin ***@***.***>
Sent: Thursday, 5 January 2023 21:37
To: keras-team/autokeras ***@***.***>
Cc: Julia Wąsala ***@***.***>; Author ***@***.***>
Subject: Re: [keras-team/autokeras] Bug: libdevice not found at ./libdevice / doesn't run on GPU (Issue #1813)
It doesn't look like a AutoKeras issue, but a local setup issue.
Can you run TF successfully on GPU with your setup?
—
Reply to this email directly, view it on GitHub<#1813 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIISAST3TZACQD55RRXGINDWQ4WFPANCNFSM6AAAAAAS6QYCMQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
haifeng-jin commented
For anyone encounter this problem, you may follow the following instructions to install TensorFlow.
conda create --name=test python=3.10 -y
conda run -n test conda install cuda-nvcc=11.3.58 cudatoolkit=11.2.2 cudnn=8.1.0 -c conda-forge -c nvidia -y
conda run -n test pip install tensorflow
conda run -n test mkdir -p $CONDA_PREFIX/etc/conda/activate.d
conda run -n test printf 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/\nexport XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/lib/\n' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
conda run -n test mkdir -p $CONDA_PREFIX/lib/nvvm/libdevice
conda run -n test cp $CONDA_PREFIX/lib/libdevice.10.bc $CONDA_PREFIX/lib/nvvm/libdevice/