
Deploy a demo of ChatGLM3 using Docker, Docker Compose, and Poetry in an environment without a GPU.

How to quickly deploy a demo of ChatGLM3 using Docker, Docker Compose, and Poetry in an environment without a GPU.

1. Create a folder named build and create a file named Dockerfile inside it with the following content:

FROM python:3.10-bookworm

RUN apt update 
RUN apt install -y ca-certificates curl
RUN curl -sSL https://install.python-poetry.org | python3 -
ENV PATH="/root/.local/bin:$PATH"
RUN poetry config virtualenvs.in-project true

With the above Dockerfile, you can create an image based on Python 3.10 that includes Poetry. Additionally, by executing poetry config virtualenvs.in-project true, you ensure that Poetry creates virtual Python environments and installs dependencies in the project folder.

2. Create a file named docker-compose.yml and add the following content:

version: "3"
      context: ./build

      - "./ChatGLM3:/app"
    working_dir: /app
      - 8501:8501
      - poetry
      - run
      - streamlit
      - run
      - web_demo2.py

By declaring TRANSFORMERS_CACHE as .cache, you ensure that model files are downloaded to the .cache folder in the project directory.

3. Clone the ChatGLM3 project from GitHub:

git clone https://github.com/THUDM/ChatGLM3

4. Create a file named pyproject.toml in the ChatGLM3 folder and add the following content:

name = "app"
version = "0.1.0"
description = ""
authors = ["Your Name <you@example.com>"]
readme = "README.md"

python = "^3.10"
protobuf = "^4.24.4"
transformers = "4.30.2"
cpm-kernels = "^1.0.11"
gradio = "3.39"
mdtex2html = "^1.2.0"
sentencepiece = "^0.1.99"
accelerate = "^0.24.1"
sse-starlette = "^1.6.5"
streamlit = ">=1.24.0"
fastapi = "0.95.1"
typing-extensions = "4.4.0"
uvicorn = "^0.23.2"
torch = {version = "^2.1.0+cpu", source = "pytorch"}

name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
priority = "explicit"

requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

This file is the project's Poetry configuration, declaring all the required dependencies and setting the index-url for PyTorch to "https://download.pytorch.org/whl/cpu".

5. Modify ChatGLM3/web_demo2.py

Replace the contents of ChatGLM3/web_demo2.py with the following code:

import streamlit as st
import torch
from transformers import AutoModel, AutoTokenizer

# 设置页面标题、图标和布局
    page_title="ChatGLM3-6B 演示",

# 设置为模型ID或本地文件夹路径
model_path = "THUDM/chatglm3-6b"

def get_model():
    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    model = AutoModel.from_pretrained(model_path, trust_remote_code=True).float()
    # 多显卡支持,使用下面两行代替上面一行,将num_gpus改为你实际的显卡数量
    # from utils import load_model_on_gpus
    # model = load_model_on_gpus("THUDM/chatglm3-6b", num_gpus=2)
    model = model.eval()
    return tokenizer, model

# 加载Chatglm3的model和tokenizer
tokenizer, model = get_model()

# 初始化历史记录和past key values
if "history" not in st.session_state:
    st.session_state.history = []
if "past_key_values" not in st.session_state:
    st.session_state.past_key_values = None

# 设置max_length、top_p和temperature
max_length = st.sidebar.slider("max_length", 0, 32768, 8192, step=1)
top_p = st.sidebar.slider("top_p", 0.0, 1.0, 0.8, step=0.01)
temperature = st.sidebar.slider("temperature", 0.0, 1.0, 0.6, step=0.01)

# 清理会话历史
buttonClean = st.sidebar.button("清理会话历史", key="clean")
if buttonClean:
    st.session_state.history = []
    st.session_state.past_key_values = None
    if torch.cuda.is_available():

# 渲染聊天历史记录
for i, message in enumerate(st.session_state.history):
    if message["role"] == "user":
        with st.chat_message(name="user", avatar="user"):
        with st.chat_message(name="assistant", avatar="assistant"):

# 输入框和输出框
with st.chat_message(name="user", avatar="user"):
    input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
    message_placeholder = st.empty()

# 获取用户输入
prompt_text = st.chat_input("请输入您的问题")

# 如果用户输入了内容,则生成回复
if prompt_text:

    history = st.session_state.history
    past_key_values = st.session_state.past_key_values
    for response, history, past_key_values in model.stream_chat(

    # 更新历史记录和past key values
    st.session_state.history = history
    st.session_state.past_key_values = past_key_values

This file modifies the original model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda() to model = AutoModel.from_pretrained(model_path, trust_remote_code=True).float() to make it run in an environment without a GPU.

Now, all the necessary configurations are in place. You can proceed to install dependencies and start the service.

6. Install Dependencies

Install the project's required dependencies with the following command:

docker-compose run --rm py poetry install

7. Start the Service

Start the service with the following command:

docker-compose up -d

You may need to wait for some time as the service downloads the model files. Once it's ready, you can access the deployed service's interface in your web browser at http://localhost:8501/. If you're deploying remotely, replace localhost with the appropriate IP address.

8. Start Chatting

Now you can start chatting with ChatGLM3 using the service you've deployed!

