try to run XAgentLlama in docker, but Error out: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX ..GPU has compute capability 7.5.
Closed this issue · 3 comments
Issue Description / 问题描述
Please provide a detailed description of the error or issue you encountered. / 请详细描述您遇到的错误或问题。
try to run XAgentLlama in docker, but Error out: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA GeForce RTX ..GPU has compute capability 7.5.
Steps to Reproduce / 复现步骤
Please provide the specific steps to reproduce the error. / 请提供复现错误的具体步骤。
docker run -it -p 13520:13520 --network tool-server-network -v /host/model/path:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520
Expected Behavior / 预期行为
Describe the behavior you expected to see. / 请描述您期望的正确行为。
Despite the GPU capacity, we should be able to run the container successfully in docker
Environment / 环境信息
- Operating System / 操作系统:
- Python Version / Python 版本:
- Other Relevant Information / 其他相关信息:
Error Screenshots or Logs / 错误截图或日志
If possible, please provide relevant screenshots or logs of the error. / 如果可能,请提供相关的错误截图或日志文件。
Additional Notes / 其他备注
If you have any additional information or notes, please add them here. / 如果有其他补充信息,请在此处添加。
You can try to set the param of vllm in app.py,refer to line 40, set the dtype
as 'half'
engine_configs = AsyncEngineArgs(
worker_use_ray=False,
engine_use_ray=False,
model=model_path,
tokenizer=None,
tokenizer_mode='auto',
tensor_parallel_size=1,
dtype='half',
quantization=None,
revision=None,
tokenizer_revision=None,
seed=42,
gpu_memory_utilization=0.9,
swap_space=4,
disable_log_requests=True,
max_num_batched_tokens=16384,
max_model_len=16384,
)
see also:vllm-project/vllm#730
awesome! I will have a try and let you know the result. :)
by the way, I also test out Microsoft's autogen with local Llama model via LM studio, and it works fine.
Asumming XAgent is using the same end point /completion, are we expecting it should work as well?
by chaning the type to half, it works. :) Thank you.