shubham-goel/ucmr

Setting up the environment on RTX3090

normster opened this issue · 4 comments

Hi Shubham,

I'm having some trouble setting up the environment to run the code and was wondering if you had any insights on what I'm doing wrong.

First I tried a manual installation with a new conda environment, but when I import torch and run torch.cuda.is_available() in the new ucmr environment it returns false. I have cuda 11.1/drivers 455.45.01 installed on my machine (with an RTX3090) but my understanding is that the pytorch installation should come with cuda 10.0, which should run just fine on my drivers. Setting up the rest of the requirements and then running the demo program gives a torch cuda init error.

I also tried setting up docker by first pulling the image with sudo docker pull shubhamgoel/birds:bigbang and then running sudo docker run shubhamgoel/birds:bigbang python -m src.demo ... from my local clone of this github repo. This results in a ModuleNotFoundError. I also tried setting the working directory to the local clone with the -w flag which gives the same error. I've never used docker before so I'm not sure how I'm supposed to be running code with it.

Thanks!

EDIT: I realized I messed up my drivers and fixed it by just reinstalling.

Hi Norman, glad to hear the first issue is sorted. Also, nice setup!

Re:docker, The docker image doesn't contain the ucmr source code and a copy of the CUB dataset. You'll need to mount those 2 directories (local clone of UCMR, and CUB) in your docker run command. To open an interactive shell in your docker container, cd into the ucmr/ directory and run the following:

sudo nvidia-docker run -it \
--mount type=bind,source="$(pwd)",target=/workspace/ucmr/ \
--mount type=bind,source=/path/to/CUB_200_2011/,target=/scratch/shubham/CUB_200_2011/,readonly \
--ipc=host \
shubhamgoel/birds:bigbang \
/bin/bash 

To run code directly,

sudo nvidia-docker run -P \
--mount type=bind,source="$(pwd)",target=/workspace/ucmr/ \
--mount type=bind,source=/path/to/CUB_200_2011/,target=/scratch/shubham/CUB_200_2011/,readonly \
--ipc=host \
shubhamgoel/birds:bigbang \
/bin/bash  -c '\
    cd /workspace/ucmr/; \
    python -m src.experiments.camOpt_shape ... ; \
'

Closing for now, please reopen if issues persist.

Thanks for the help, I'll give it a try!

In the mean time I set up the local environment without any issues but realized that the older cuda/cudnn versions in the dependencies aren't supported with the RTX 30xx cards. It looks like this can be resolved by rebuilding Pytorch/SoftRas/NMR with a special flag for nvcc which can be passed through setuptools/setup.py but I haven't tried this out.

Instead, I was able to install SoftRas and NMR in pytorch nightly 1.8.0 (with CUDA 11.1) by replacing usages of AT_CHECK with TORCH_CHECK as others pointed out here and here. The demo then runs fine, albeit with various deprecation warnings.

I just realized that this issue probably affects the Docker route as well, but am less clear on how to fix that.

Glad to hear you could get the demo working! Can you run the training scripts as well now?

I didn't realize sooner but the pytorch version in my docker image wouldn't have been built to work with the 30xx cards. If you got pytorch working on your local machine already, I wouldn't recommend going the docker route. It is likely going to be more cumbersome (and possibly pointless).

But if you still want to, Step1 is finding a docker image containing pytorch built to work with 30xx cards. This could be an option. Can you use pytorch successfully inside nvidia-docker run -it --ipc=host nvcr.io/nvidia/pytorch:20.12-py3 bash'?

Hi Normster, thank you very mush. I have replaced AT_CHECK. My environment is RTX3090, pytorch 1.7.1 stable and cuda 11.1. Eventually it worked.