lllyasviel/DanbooRegion

Example code returns flat image without any segment.

kosuke1701 opened this issue · 7 comments

First of all, thank you very much for sharing your interesting project!

I tried to setup an environment with Docker, and the following example code ran without error and output three images as expected.

python segment.py ./emilia.jpg

However, the output looks like somewhat different from what is shown on README. (The following image is current_skelton.png which is output by the sample code.)

Dockerfile

Note: I've changed numba version according to this issue.

FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04

RUN apt-get update\
    && apt-get install -y --no-install-recommends \
    wget gcc make zlib1g-dev libssl-dev libopencv-dev \
    && apt-get -y clean \
    && rm -rf /var/lib/apt/lists/*

RUN apt-get remove -y --allow-change-held-packages libcudnn7


RUN apt-get update && apt-get install -y --no-install-recommends \
    libcudnn7=7.0.5.15-1+cuda9.0 \
    && apt-mark hold libcudnn7 && \
    rm -rf /var/lib/apt/lists/*

RUN wget https://www.python.org/ftp/python/3.6.11/Python-3.6.11.tgz \
    && tar zxvf Python-3.6.11.tgz \
    && cd Python-3.6.11 \
    && ./configure \
    && make && make install \
    && ln -s /usr/local/bin/python3.6 /usr/local/bin/python \
    && ln -s /usr/local/bin/pip3.6 /usr/local/bin/pip \
    && cd .. \
    && rm -r Python-3.6.11*

RUN pip install -U pip \
    && pip install tensorflow-gpu==1.5.0 \
    && pip install keras==2.2.4 \
    && pip install opencv-python==3.4.2.17 \
    && pip install numpy==1.15.4 \
    && pip install numba==0.49.0 \
    && pip install scipy==1.1.0 \
    && pip install scikit-image==0.13.0 \
    && pip install scikit-learn==0.22.2 \
    && pip install -U h5py==2.10.0 \
    && pip cache purge

What I did

# Working directory is a directory where I created the Dockerfile.
docker build -t sample:0.1 .

# Mount project directory and execute sample code.
sudo docker run -it --rm --gpus all -v /path/to/DanbooRegion:/DanbooRegion --name debug sample:0.1 /bin/bash
$ cd /DanbooRegion/code
$ python segment.py ./emilia.jpg

Environment

  • OS: Ubuntu 20.04 LTS
  • Docker version: 20.10.5, build 55c4c88
  • nvidia-docker2 version: 2.6.0-1
  • GPU: GeForce RTX 3090
  • Nvidia driver version: 460.73.01

Do you mean that you get a blank white output image and cannot get any meaningful outputs? You have uploaded a blank white image, and that image is what you get?

Yes. The uploaded blank white image is what I got as a skelton map. The other two images (current_flatten.png, current_region.png) are also blank images with different colors.

If no python errors are reported, it is likely that the models are not properly loaded. Have you downloaded the pretrained model and put it in the correct places?

I used the following pretrained models which are uploaded to the github repo.

https://github.com/lllyasviel/DanbooRegion/blob/master/code/DanbooRegion2020UNet.net
https://github.com/lllyasviel/DanbooRegion/blob/master/code/srcnn.net

Currently, I'm training a new model from scratch using a provided training code to see whether it works or not.

EDITED:

I tested the new model which is trained from scratch, but it still returned blank images.

It seems that Ampere architecture of RTX 3090 is not supported by CuDNN 7, which is required by tensorflow version 1.

https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html

Since it seems to be an issue at lower layer related to hardware, I close this issue. Thank you for your responses.

Thank you for your report! Another thing is that I have seen many of my clients with 3090/30XX using tf 1.4 without any trouble. But the official document does say they do not support these versions. Have you done anything unique to your environment?

I think I have followed standard procedures to setup my environment. After a bit of research, I found that there is a way to use tensorflow 1.15 with 3090 GPU.

https://www.pugetsystems.com/labs/hpc/How-To-Install-TensorFlow-1-15-for-NVIDIA-RTX30-GPUs-without-docker-or-CUDA-install-2005/

After I set up new tensorflow 1.15 environment with the above procedure, I got different output as follows, and it still seems to be not working :( I will try same environment setup on my different PC with RTX 1070 Ti to see if something different happens when I return home.