aws/sagemaker-huggingface-inference-toolkit

trust_remote_code=True in new Hugging Face LLM Inference Container for Amazon SageMaker

Closed this issue · 2 comments

Hi team,

That is probably a question specifically for @philschmid :)

I'm going through this blog post to deploy Falcon 40b instruct on Sagemaker using the new Hugging Face LLM Inference Container for Amazon SageMaker.

The deployment fails with the following error:

ValueError: Loading tiiuae/falcon-40b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

it seems we can pass this parameter as part of the deploy method:

llm = llm_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  endpoint_name='falcon-40B-instruct',
  trust_remote_code=True,
  # volume_size=400, # If using an instance with local SSD storage, volume_size must be None, e.g. p4 but not p3
  container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)

but it doesn't seem to have any effect. A workaround would be to use the transformers lib and do it without this container, but I would love to have this easy way of deploying the models !

Any way to fix this ? Thank you !

Hello @krokoko,

The LLM container is unrelated to the huggingface-inference-toolkit. You could deploy the model creating a custom infernece.py which enables trust_remote_code and uses device_map="auto" from accelerate to parallize the model. But you most likely need a p4 instance for that.

Thanks @philschmid ! Closing this issue as it was also already reported and it seems it was merged here: huggingface/text-generation-inference#394 waiting for the new version of the container to test