Support passing model_kwargs to pipeline
lukealexmiller opened this issue · 1 comments
I'm trying to deploy BLIP-2 (specifically Salesforce/blip2-opt-2.7b
) to a Sagemaker (SM) endpoint, but coming up against some problems.
We can deploy this model by tar'ing the model artifacts as model.tar.gz
and hosting on S3, but creating a ~9GB tar file is time-consuming and leads to slow deployment feedback loops.
Alternatively, the toolkit has experimental support for downloading models from 🤗Hub on start, which is a more time/space efficient.
However, this functionality only supports passing HF_TASK
and HF_MODEL_ID
as env vars. In order to run inference on this model using GPU's available on SM (T4/A10) we need to pass additional model_kwargs
as:
pipe = pipeline(model="Salesforce/blip2-opt-2.7b", model_kwargs={"load_in_8bit": True})
A potential solution to this would be:
On line 104 of handler_service.py the ability to pass kwargs
has not been implemented, but the function get_pipeline
allows for kwargs
.
Hello @lukealexmiller,
Thank you for opening the request. It is a good idea to think about adding "HF_KWARGS" as parameter.
In the meantime you can enable this by creating a custom inference.py
. See here for an example: https://www.philschmid.de/custom-inference-huggingface-sagemaker