ModelTC/lightllm

[BUG] failed to serve a Qwen1.5-72B-chat model

Closed this issue · 2 comments

Issue description:
Launching a server for a 7B model succeeded but failed on serving a 72B model. The launcher took about half an hour to initialize and then reported EOFError: connection closed by peer.

Please provide a clear and concise description of your issue.

Steps to reproduce:

Please list the steps to reproduce the issue, such as:

  1. run the container ghcr.io/modeltc/lightllm:main
  2. start server:
python -m lightllm.server.api_server --model_dir ~/resources/huggingface/models/Qwen/Qwen1.5-72B-chat/     \
                                     --host 0.0.0.0                 \
                                     --port 8080                    \
                                     --tp 8                         \
                                     --eos_id 151645 \
                                     --trust_remote_code \
                                     --max_total_token_num 120000
  1. Wait for half an hour and see error

Expected behavior:

Please describe what you expected to happen.

Error logging:

< python -m lightllm.server.api_server --model_dir ~/resources/huggingface/models/Qwen/Qwen1.5-72B-chat/     \
                                     --host 0.0.0.0                 \
                                     --port 8080                    \
                                     --tp 8                         \
                                     --eos_id 151645 \
                                     --trust_remote_code \
                                     --max_total_token_num 120000
INFO 03-09 16:38:17 [tokenizer.py:79] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 03-09 16:38:21 [tokenizer.py:79] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.

INFO 03-09 17:07:54 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:54 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:56 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:56 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:58 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:58 [mem_utils.py:18] Model kv cache using mode normal
INFO 03-09 17:07:58 [mem_utils.py:9] mode setting params: []
INFO 03-09 17:07:58 [mem_utils.py:18] Model kv cache using mode normal
ERROR 03-09 17:07:58 [start_utils.py:24] init func start_router_process : Traceback (most recent call last):
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 379, in start_router_process
ERROR 03-09 17:07:58 [start_utils.py:24]     asyncio.run(router.wait_to_model_ready())
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
ERROR 03-09 17:07:58 [start_utils.py:24]     return loop.run_until_complete(main)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/manager.py", line 83, in wait_to_model_ready
ERROR 03-09 17:07:58 [start_utils.py:24]     await asyncio.gather(*init_model_ret)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 455, in init_model
ERROR 03-09 17:07:58 [start_utils.py:24]     await ans
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/lightllm/lightllm/server/router/model_infer/model_rpc.py", line 427, in _func
ERROR 03-09 17:07:58 [start_utils.py:24]     await asyncio.to_thread(ans.wait)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/asyncio/threads.py", line 25, in to_thread
ERROR 03-09 17:07:58 [start_utils.py:24]     return await loop.run_in_executor(None, func_call)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
ERROR 03-09 17:07:58 [start_utils.py:24]     result = self.fn(*self.args, **self.kwargs)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/async_.py", line 51, in wait
ERROR 03-09 17:07:58 [start_utils.py:24]     self._conn.serve(self._ttl)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/protocol.py", line 438, in serve
ERROR 03-09 17:07:58 [start_utils.py:24]     data = self._channel.poll(timeout) and self._channel.recv()
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/channel.py", line 55, in recv
ERROR 03-09 17:07:58 [start_utils.py:24]     header = self.stream.read(self.FRAME_HEADER.size)
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24]   File "/opt/conda/lib/python3.9/site-packages/rpyc/core/stream.py", line 280, in read
ERROR 03-09 17:07:58 [start_utils.py:24]     raise EOFError("connection closed by peer")
ERROR 03-09 17:07:58 [start_utils.py:24]
ERROR 03-09 17:07:58 [start_utils.py:24] EOFError: connection closed by peer
ERROR 03-09 17:07:58 [start_utils.py:24]

Environment:

Please provide information about your environment, such as:

  • Using container

  • OS: Ubuntu 20.04.6

  • GPU info:

    • nvidia-smi: NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2
    • Graphics cards: H800-80G x 8
  • Python: Python 3.9.18

  • LightLLm: 486f647

  • openai-triton: pip show triton

Name: triton
Version: 2.1.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/openai/triton/
Author: Philippe Tillet
Author-email: phil@openai.com
License: 
Location: /opt/conda/lib/python3.9/site-packages
Requires: filelock
Required-by: lightllm

@pluiez thanks, we will check it.

@pluiez thanks, we will check it.

Hi,

I have found the root cause of the issue. The problem was that my checkpoint was a single pytorch_model.bin file of around 140GB. After splitting this file into shards of no more than 10GB each, the service was able to start successfully.

I was using a network file system, which was causing the lightllm service to terminate due to a load timeout. I hope this information is helpful to others, and you may consider adding a parameter to the command line to control the timeout during the model loading phase.