Unable to load local model from the directory for TGI Gaudi 1.2 version
avinashkarani opened this issue ยท 15 comments
System Info
tgi-gaudi V1.2.
Error: Traceback (most recent call last):
File "/usr/local/bin/text-generation-server", line 8, in
sys.exit(app())
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 149, in download_weights
utils.weight_files(model_id, revision, extension)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/hub.py", line 96, in weight_files
filenames = weight_hub_files(model_id, revision, extension)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files
info = api.model_info(model_id, revision=revision)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name':
Pointed directory has below files
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Run command: docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id /home/ubuntu/model_10e/ --sharded true --num-shard 8 --max-total-tokens 5120
Expected behavior
TGI should have loaded local model for serving but it failed to recognize local model.
Looking at your command:
docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id /home/ubuntu/model_10e/ --sharded true --num-shard 8 --max-total-tokens 5120
it seems you give as model-id
the path to your model on your instance (i.e. /home/ubuntu/model_10e/
).
The TGI server is run inside a Docker container that doesn't have access to that folder by default. You need to use the -v
arg to link a volume on your host to a volume in your container.
Can you try the following command?
docker run -p 8080:80 -v /home/ubuntu:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id /data/model_10e/ --sharded true --num-shard 8 --max-total-tokens 5120
Thanks for the update. I made the changes to point the model in the mounted volume but facing the below issue.
docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --sharded true --num-shard 8 --model-id $model
2024-02-07T08:46:59.875416Z INFO text_generation_launcher: Args { model_id: "/data/Local_model/RTL_model_50e/", revision: None, validation_workers: 2, sharded: Some(true), num_shard: Some(8), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "cdb285643c82", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-02-07T08:46:59.875446Z INFO text_generation_launcher: Sharding model on 8 processes
2024-02-07T08:46:59.875559Z INFO download: text_generation_launcher: Starting download process.
2024-02-07T08:47:02.212617Z ERROR download: text_generation_launcher: Download encountered an error: /usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:252: UserWarning: Device capability of hccl unspecified, assuming cpu
and cuda
. Please specify it via the devices
argument of register_backend
.
warnings.warn(
Traceback (most recent call last):
File "/usr/local/bin/text-generation-server", line 8, in
sys.exit(app())
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 178, in download_weights
model_id, revision, trust_remote_code=trust_remote_code
NameError: name 'trust_remote_code' is not defined
Error: DownloadError
Also tried with option --trust-remote-code but then i got below issue
2024-02-07T08:50:32.665803Z ERROR download: text_generation_launcher: Download encountered an error: Usage: text-generation-server download-weights [OPTIONS] MODEL_ID
Try 'text-generation-server download-weights --help' for help.
Error: No such option: --trust-remote-code
Error: DownloadError
It seems to be a bug indeed. Will propose a fix.
@avinashkarani Can you try it using the branch called fix_trust_remote_code
please?
@regisss , Tested new branch observed two issues
- If we run with PEFT model (with adapters config present), its peft version seems to be not compatible
Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:12<00:00, 6.48s/it]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 213, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 177, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 512, in init
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, **model_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3380, in from_pretrained
model.load_adapter(
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 137, in load_adapter
check_peft_version(min_version=MIN_PEFT_VERSION)
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
raise ValueError(
ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0 rank=0
@regisss 2) Issue if I proved model with adaptors already merged, Code still defaults to PEFT model and failing with adapters config not found issue
is_local_model = (Path(model_id).exists() and Path(model_id).is_dir()) or os.getenv(
"WEIGHTS_CACHE_OVERRIDE", None
) is not None
if not is_local_model:
# Try to download weights from the hub
try:
filenames = utils.weight_hub_files(model_id, revision, extension)
utils.download_weights(filenames, model_id, revision)
# Successfully downloaded weights
return
# No weights found on the hub with this extension
except utils.EntryNotFoundError as e:
# Check if we want to automatically convert to safetensors or if we can use .bin weights instead
if not extension == ".safetensors" or not auto_convert:
raise e
else:
# Try to load as a local PEFT model
try:
utils.download_and_unload_peft(
model_id, revision, trust_remote_code=trust_remote_code
)
utils.weight_files(model_id, revision, extension)
return
except (utils.LocalEntryNotFoundError, utils.EntryNotFoundError):
pass
# Try to see if there are local pytorch weights
try:
# Get weights for a local model, a hub cached model and inside the WEIGHTS_CACHE_OVERRIDE
local_pt_files = utils.weight_files(model_id, revision, ".bin")
Above code in cli.py is first checking for PEFT model and failing in the else condition
@avinashkarani Currently I tried to build the docker image using the v1.2-release branch but somehow got the dependencies conflict. Did you got this issue as well? Thank you
Have tried loosen the package with pip install -U --no-deps
, but still got the issue
323.2 The conflict is caused by:
323.2 text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2 tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2 diffusers 0.26.3 depends on huggingface-hub>=0.20.2
323.2 text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2 tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2 diffusers 0.26.2 depends on huggingface-hub>=0.20.2
323.2 text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2 tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2 diffusers 0.26.1 depends on huggingface-hub>=0.20.2
323.2 text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2 tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2 diffusers 0.26.0 depends on huggingface-hub>=0.20.2
@muhammad-asn Can you try this PR and let me know if that works for you?
@regisss , Tested new branch observed two issues
- If we run with PEFT model (with adapters config present), its peft version seems to be not compatible
Loading checkpoint shards: 100%|โโโโโโโโโโ| 2/2 [00:12<00:00, 6.48s/it]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
main(args)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 213, in serve
asyncio.run(serve_inner(model_id, revision, dtype, sharded))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 177, in serve_inner
model = get_model(model_id, revision=revision, dtype=data_type)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
return CausalLM(model_id, revision, dtype)
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 512, in init
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, **model_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3380, in from_pretrained
model.load_adapter(
File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 137, in load_adapter
check_peft_version(min_version=MIN_PEFT_VERSION)
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
raise ValueError(
ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0 rank=0
@avinashkarani I think the issue here is that you're using a checkpoint that was created with an old version of PEFT. Could you try with a more recent checkpoint?
@regisss 2) Issue if I proved model with adaptors already merged, Code still defaults to PEFT model and failing with adapters config not found issue
is_local_model = (Path(model_id).exists() and Path(model_id).is_dir()) or os.getenv( "WEIGHTS_CACHE_OVERRIDE", None ) is not None
if not is_local_model: # Try to download weights from the hub try: filenames = utils.weight_hub_files(model_id, revision, extension) utils.download_weights(filenames, model_id, revision) # Successfully downloaded weights return # No weights found on the hub with this extension except utils.EntryNotFoundError as e: # Check if we want to automatically convert to safetensors or if we can use .bin weights instead if not extension == ".safetensors" or not auto_convert: raise e else: # Try to load as a local PEFT model try: utils.download_and_unload_peft( model_id, revision, trust_remote_code=trust_remote_code ) utils.weight_files(model_id, revision, extension) return except (utils.LocalEntryNotFoundError, utils.EntryNotFoundError): pass # Try to see if there are local pytorch weights try: # Get weights for a local model, a hub cached model and inside the WEIGHTS_CACHE_OVERRIDE local_pt_files = utils.weight_files(model_id, revision, ".bin")
Above code in cli.py is first checking for PEFT model and failing in the else condition
What is the path to your local model? Are you sure the volume where your model is located is well linked to the Docker container?
@regisss let me check first. thank you for your fast response
@muhammad-asn is the problem still exists? Can we close this issue?
Yes you can close this issue, it is solved by recent PR
Thank you