defenseunicorns/leapfrogai

Can't set vLLM container to have `LAI_QUANTIZATION` set to None

vanakema opened this issue 4 months ago · 0 comments

vanakema commented 4 months ago

Environment

OS and Architecture: MacOS arm64
App or Package Name: vLLM
App or Package Version: v11
Kubernetes Distribution: k3d & rke2
Kubernetes Version: unknown
Other:

Steps to reproduce

Deploy the vLLM pod with an unquantized model
Observe that it errors out because the quantization method defaults to gptq
Observe that there is no good way of unsetting the LAI_QUANTIZATION env var without overriding the ENTRYPOINT of your container and calling unset LAI_QUANTIZATION before running the usual entrypoint

Expected result

When the LAI_QUANTIZATION env var is not set, I expect it to properly set the quantization method to None, so unquantized models can be ran without hacks/workarounds

Actual Result

One must come up with a workaround to get the LAI_QUANTIZATION env var unset

Additional Context

I'd approach this problem by just not setting env vars in the Dockerfile itself for the vLLM image for any configuration that is probably going to differ between environments. This way you'll never set env vars that need to be unset to work properly
Once this is done, move that configuration to kubernetes, ideally in a config map so it is persisted across upgrades