Support vLLM

Question

Closed this issue a year ago · 0 comments

Pull the Docker image with CUDA 11.8.

Use --ipc=host to make sure the shared memory is large enough.
docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:22.12-py3