trust_remote_code=True in new Hugging Face LLM Inference Container for Amazon SageMaker
Closed this issue · 2 comments
Hi team,
That is probably a question specifically for @philschmid :)
I'm going through this blog post to deploy Falcon 40b instruct on Sagemaker using the new Hugging Face LLM Inference Container for Amazon SageMaker.
The deployment fails with the following error:
ValueError: Loading tiiuae/falcon-40b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
it seems we can pass this parameter as part of the deploy method:
llm = llm_model.deploy(
initial_instance_count=1,
instance_type=instance_type,
endpoint_name='falcon-40B-instruct',
trust_remote_code=True,
# volume_size=400, # If using an instance with local SSD storage, volume_size must be None, e.g. p4 but not p3
container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)
but it doesn't seem to have any effect. A workaround would be to use the transformers lib and do it without this container, but I would love to have this easy way of deploying the models !
Any way to fix this ? Thank you !
Hello @krokoko,
The LLM
container is unrelated to the huggingface-inference-toolkit
. You could deploy the model creating a custom infernece.py
which enables trust_remote_code
and uses device_map="auto"
from accelerate
to parallize the model. But you most likely need a p4 instance for that.
Thanks @philschmid ! Closing this issue as it was also already reported and it seems it was merged here: huggingface/text-generation-inference#394 waiting for the new version of the container to test