defenseunicorns/leapfrogai

Can't set vLLM container to have `LAI_QUANTIZATION` set to None

vanakema opened this issue · 0 comments

Environment

  1. OS and Architecture: MacOS arm64
  2. App or Package Name: vLLM
  3. App or Package Version: v11
  4. Kubernetes Distribution: k3d & rke2
  5. Kubernetes Version: unknown
  6. Other:

Steps to reproduce

  1. Deploy the vLLM pod with an unquantized model
  2. Observe that it errors out because the quantization method defaults to gptq
  3. Observe that there is no good way of unsetting the LAI_QUANTIZATION env var without overriding the ENTRYPOINT of your container and calling unset LAI_QUANTIZATION before running the usual entrypoint

Expected result

  • When the LAI_QUANTIZATION env var is not set, I expect it to properly set the quantization method to None, so unquantized models can be ran without hacks/workarounds

Actual Result

  • One must come up with a workaround to get the LAI_QUANTIZATION env var unset

Additional Context

  • I'd approach this problem by just not setting env vars in the Dockerfile itself for the vLLM image for any configuration that is probably going to differ between environments. This way you'll never set env vars that need to be unset to work properly
  • Once this is done, move that configuration to kubernetes, ideally in a config map so it is persisted across upgrades