XAgentGen: XAgentLlaMa-34B-preview能否通过多卡直接推理

Question

XAgentGen: XAgentLlaMa-34B-preview能否通过多卡直接推理

Turingforce opened this issue a year ago · 1 comments

问题/ Question：如何用3090 (4090) 24G * n的配置来运行XAgentLlaMa-34B-preview，或者显存的要求是多少？

运行/Run：

docker run -it -p 13520:13520 --network tool-server-network -v /mnt/XAgentLLaMa-34B-preview:/model:rw --gpus all --ipc=host xagentteam/xagentgen:latest python app.py --model-path /model --port 13520

日志/Log：

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 688.00 MiB. GPU 0 has a total capacty of 23.69 GiB of which 473.19 MiB is free. Process 59146 has 23.21 GiB memory in use. Of the allocated memory 22.75 GiB is allocated by PyTorch, and 9.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

测试配置/GPUs:

Answer 1 · 2023-12-21T06:51:24.000Z

请参考 #248 #275