OpenBMB/XAgent

try to run XAgentLlama in docker, but Error out: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX ..GPU has compute capability 7.5.

Closed this issue · 3 comments

Issue Description / 问题描述

Please provide a detailed description of the error or issue you encountered. / 请详细描述您遇到的错误或问题。
try to run XAgentLlama in docker, but Error out: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX ..GPU has compute capability 7.5.

Steps to Reproduce / 复现步骤

Please provide the specific steps to reproduce the error. / 请提供复现错误的具体步骤。
docker run -it -p 13520:13520 --network tool-server-network -v /host/model/path:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520

Expected Behavior / 预期行为

Describe the behavior you expected to see. / 请描述您期望的正确行为。
Despite the GPU capacity, we should be able to run the container successfully in docker

Environment / 环境信息

  • Operating System / 操作系统:
  • Python Version / Python 版本:
  • Other Relevant Information / 其他相关信息:

Error Screenshots or Logs / 错误截图或日志

If possible, please provide relevant screenshots or logs of the error. / 如果可能,请提供相关的错误截图或日志文件。

Additional Notes / 其他备注

If you have any additional information or notes, please add them here. / 如果有其他补充信息,请在此处添加。

AL-377 commented

You can try to set the param of vllm in app.py,refer to line 40, set the dtype as 'half'

engine_configs = AsyncEngineArgs(
    worker_use_ray=False,
    engine_use_ray=False,
    model=model_path,
    tokenizer=None,
    tokenizer_mode='auto',
    tensor_parallel_size=1,
    dtype='half',
    quantization=None,
    revision=None,
    tokenizer_revision=None,
    seed=42,
    gpu_memory_utilization=0.9,
    swap_space=4,
    disable_log_requests=True,
    max_num_batched_tokens=16384,
    max_model_len=16384,
)

see also:vllm-project/vllm#730

awesome! I will have a try and let you know the result. :)

by the way, I also test out Microsoft's autogen with local Llama model via LM studio, and it works fine.

Asumming XAgent is using the same end point /completion, are we expecting it should work as well?

by chaning the type to half, it works. :) Thank you.