try to run XAgentLlama in docker, but Error out: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX ..GPU has compute capability 7.5.

Question

try to run XAgentLlama in docker, but Error out: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX ..GPU has compute capability 7.5.

Closed this issue a year ago · 3 comments

Issue Description / 问题描述

Please provide a detailed description of the error or issue you encountered. / 请详细描述您遇到的错误或问题。
try to run XAgentLlama in docker, but Error out: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX ..GPU has compute capability 7.5.

Steps to Reproduce / 复现步骤

Please provide the specific steps to reproduce the error. / 请提供复现错误的具体步骤。
docker run -it -p 13520:13520 --network tool-server-network -v /host/model/path:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520

Expected Behavior / 预期行为

Describe the behavior you expected to see. / 请描述您期望的正确行为。
Despite the GPU capacity, we should be able to run the container successfully in docker

Environment / 环境信息

Operating System / 操作系统：
Python Version / Python 版本：
Other Relevant Information / 其他相关信息：

Error Screenshots or Logs / 错误截图或日志

If possible, please provide relevant screenshots or logs of the error. / 如果可能，请提供相关的错误截图或日志文件。

Additional Notes / 其他备注

If you have any additional information or notes, please add them here. / 如果有其他补充信息，请在此处添加。

Answer 1 · 2024-01-14T02:06:12.000Z

You can try to set the param of vllm in app.py，refer to line 40, set the dtype as 'half'

engine_configs = AsyncEngineArgs(
    worker_use_ray=False,
    engine_use_ray=False,
    model=model_path,
    tokenizer=None,
    tokenizer_mode='auto',
    tensor_parallel_size=1,
    dtype='half',
    quantization=None,
    revision=None,
    tokenizer_revision=None,
    seed=42,
    gpu_memory_utilization=0.9,
    swap_space=4,
    disable_log_requests=True,
    max_num_batched_tokens=16384,
    max_model_len=16384,
)