oobabooga/text-generation-webui

Dockerfile

slush0 opened this issue · 16 comments

slush0 commented

If there's any interest (to use it or add into this repo), I've knocked up Dockerfile.

https://github.com/slush0/docker-misc/blob/master/text-generation-webui/Dockerfile

It is also available at https://hub.docker.com/r/slush0/text-generation-webui.

phokur commented

I can't seem to connect to the instance locally once it's up. What is your docker run command?
docker run --gpus all -p 7860:7860 -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY -it slush0/text-generation-webui

rkfg commented

I think the problem is that there's no entry point or RUN line in that Dockerfile so the container terminates immediately.

If anyone happens to want to use Podman instead, I have a repo here called Text-Generation-Webui-Podman. It compiles and installs the GPTQ-for-LLaMa repo so 4bit works too.

In theory the Containerfile should be compatible with Docker, but I haven't tested it.

Does anyone know, is it possible to run multiple instance of this dockerfile using a single GPU? I'm not able to run multiple instances from bash; when I try, the second task exits with no error:

(with first instance already running)

$ python server.py --cai-chat --gptq-bits 4 --model llama-13b --listen-port 8002 --verbose
Loading llama-13b...
Loading model ...
Killed

Related to #58

I am interested in making this the default installation method to avoid all the setup trouble.

Here is what I have tried:

  1. Download the Dockerfile
  2. Build the image with
docker build . -f Dockerfile -t oobabooga
  1. Teleport into the image with
docker run -i -t oobabooga bash
  1. Download a test model
python3 download-model.py facebook/galactica-125m
  1. Try to start the web UI
python3 server.py

I get the error

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

By default, Docker will not pass in the GPU.

What happens if you try to run:

docker run -it --rm --gpus all ubuntu nvidia-smi

from https://docs.docker.com/engine/reference/commandline/run/#gpus ?

No luck:

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

In your link, it is said that

First you need to install nvidia-container-runtime

Which I guess needs to be installed on the host operating system. Would that work on Windows?

I think that Docker instruction might be a bit old and/or a specific to Linux.

Try these more specific instructions specifically for WSL2:

https://docs.nvidia.com/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl

I managed to get 4-bit working with the Dockerfile below based on the original by @slush0. Two preliminary steps were necessary:

  1. Installing nvidia-container-runtime in the host OS.
  2. Changing the docker configuration file mentioned in the comment below before building the image. This is necessary in order to get nvcc working in the container.

pytorch/extension-cpp#71 (comment)

Otherwise I would get the error

     arch_list[-1] += '+PTX'
IndexError: list index out of range

Dockerfile

FROM pytorch/pytorch

# Install base utilities
RUN apt-get update && apt-get install -y build-essential wget git vim
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

VOLUME /data

RUN git clone https://github.com/oobabooga/text-generation-webui /app
WORKDIR /app
RUN pip install -r requirements.txt
RUN rm -rf /app/models
RUN ln -s /data /app/models

RUN mkdir repositories
WORKDIR /app/repositories
RUN git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
WORKDIR /app/repositories/GPTQ-for-LLaMa
RUN python setup_cuda.py install
WORKDIR /app

#ENV PATH=$PATH:/app

Connecting

I used this command to connect to it:

docker run -it --rm --gpus all --net=host oobabooga bash

Not the most user friendly setup because of the preliminary steps, but for people who already have experience with Docker this should be useful. Building the image itself is trivial.

Valorant won't run on my Windows PC for some stupid reason at the moment and I was planning to reformat and reinstall my PC with another SSD as a temporary test to see if that was the path to get it working. It's been long overdue.

I bring this up because I'll also see if I can get the most minimal steps re: Docker + Nvidia + WSL2 + Windows going as well. I think nvidia-container-runtime is more of a Linux-ism and with Windows, there's a bit alternative MicrosoftWSL2+DockerInc ingredient which I linked.

@oobabooga If it helps your cause, I updated my repo to also support Docker. It has some extras goodies like persisting the container's data, having a smaller final build size, and makes sure to cache downloaded pip packages.

Containerfile and small Conversion Script so it works with Docker.

I'd like to make sure it's helpful for people so if anyone has a problem with it feel free to open an issue.

phokur commented

I finally figured out my port passing issue.
--net=host is not supported in Docker Desktop on Windows + WSL2.
You need to use -p 7860:7860 in the docker run command AND
python3 server.py --listen (or it will just throw reset connections in the host's browser)

suhr commented

When I try to run it in docker I get the following error:

Traceback (most recent call last):
  File "/app/server.py", line 234, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/app/modules/models.py", line 49, in load_model
    model = AutoModelForCausalLM.from_pretrained(Path(f"models/{shared.model_name}"), device_map='auto', load_in_8bit=True)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2643, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2966, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 673, in _load_state_dict_into_meta_model
    set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
    new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
  File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 196, in to
    return self.cuda(device)
  File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
    CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
  File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
    row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
  File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
    lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
  File "/opt/conda/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
    func = self.__getitem__(name)
  File "/opt/conda/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]

It would be great to have a CPU-only installation with Dockers!

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.