LINCellularNeuroscience/VAME

GPU not being recognized in new installations

akesner1 opened this issue · 4 comments

Hi all,

Is anyone else having an issue with creating new anaconda env with/for VAME and the GPU not being recognized? When I started using VAME earlier this year (~March?) I simply followed the installation on the front page here and made a new anaconda env using the .yaml file and then installed VAME in there with the setup file, exactly as instructed. Everything worked fine and when I run 'import VAME' it saw the GPU and used it during the pipeline. This past week I tried installing on a different PC with a NVIDIA T1000 GPU and it doesnt say anything after running 'import VAME', and uses the CPU (actually causes the PC to reboot after about a minute, which is weird... but not the issue I care about here...). I went back to my PC that has the VAME environment that I had been using and works fine, and made a new anaconda env with with the most recent release files (1.1) and it doesnt see that GPU either (RTX2080ti). But when I go into my original VAME env, it sees the GPU and runs fine.

SO is there some package missing from the updated versions of VAME that is making this issue?

Thanks,
Drew

Hi Drew,
I suspect this could be because your GPU driver / CUDA toolbox version changed in your underlying OS.
Can you check if pytorch could recognize the GPU?
Follow, for example, this instructions; https://stackoverflow.com/questions/48152674/how-do-i-check-if-pytorch-is-using-the-gpu

Best,
Pavol

Thanks for the info. As that stackoverflow page suggested I started python (just type python into your anaconda terminal in VAME env) and ran 'torch.cuda.is_available()' and saw False. Running 'torch.cuda.current_device()' gave the error "[AssertionError: Torch not compiled with CUDA enabled]"...

After digging a little more, it seemed that during the VAME installation process a CPU version of torch was installed (it had CPU in the title). I found this issue page (pytorch/pytorch#30664) and a ways down found info:

"If you've ever ran pip install torch without the -f https://download.pytorch.org/whl/torch_stable.html argument, chances are, you have the corrupt cpu only version in your pip cache. whatever reinstall you do, doesn't matter because pip will just pull the same bad version from cache. to solve:

pip uninstall torch
pip cache purge
pip install torch -f https://download.pytorch.org/whl/torch_stable.html"

Running those commands in the VAME environment seems to have worked. Now torch is seeing the GPU and importing vame into python shows what I am used to seeing:

import vame
Using CUDA
GPU active: True
GPU used: NVIDIA T1000 8GB

So if anyone else having issues with GPU being recognized, maybe try using the pip commands above to uninstall, clear the pip cache, and reinstall torch.

Thanks again @pavolbauer for the tips to get this figured out.
Drew

@akesner1
Hi, my solution is that:

  1. make sure you already have cuda installed, mine is cuda 11.7

  2. set up python3.10 virtual env

  3. first install the pytorch following its official website, like 'conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia'

  4. get into python then

    import torch
    torch.cuda.is_available()

    if GPU is available, it should show 'True'

  5. install other dependencies, which are shown in 'VAME.yaml' except torch.

  6. install VAME

  7. in python when you import VAME, you can see something like:

    import VAME

    Using CUDA
    GPU active: True
    GPU used: NVIDIA GeForce RTX 2080 Ti

by this way we can use more updated versions of packages.

Wulin

Yeah, it doesn't like CUDA 11.8.1 Torch stable seems to require 11.7 or below (haven't tested 11.7.1). Yup, torch.cuda.is_available() was causing issues. (CUDA 11.8.1, RTX 3090 compute level), but 11.7.1 works.

torch.cuda.is_available()
True