huggingface/tgi-gaudi

Unable to load local model from the directory for TGI Gaudi 1.2 version

avinashkarani opened this issue ยท 15 comments

System Info

tgi-gaudi V1.2.

Error: Traceback (most recent call last):

File "/usr/local/bin/text-generation-server", line 8, in
sys.exit(app())

File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 149, in download_weights
utils.weight_files(model_id, revision, extension)

File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/hub.py", line 96, in weight_files
filenames = weight_hub_files(model_id, revision, extension)

File "/usr/local/lib/python3.10/dist-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files
info = api.model_info(model_id, revision=revision)

File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)

File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
raise HFValidationError(

huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name':

Pointed directory has below files
image

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Run command: docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id /home/ubuntu/model_10e/ --sharded true --num-shard 8 --max-total-tokens 5120

Expected behavior

TGI should have loaded local model for serving but it failed to recognize local model.

Looking at your command:

docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id /home/ubuntu/model_10e/ --sharded true --num-shard 8 --max-total-tokens 5120

it seems you give as model-id the path to your model on your instance (i.e. /home/ubuntu/model_10e/).
The TGI server is run inside a Docker container that doesn't have access to that folder by default. You need to use the -v arg to link a volume on your host to a volume in your container.
Can you try the following command?

docker run -p 8080:80 -v /home/ubuntu:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --model-id /data/model_10e/ --sharded true --num-shard 8 --max-total-tokens 5120

Thanks for the update. I made the changes to point the model in the mounted volume but facing the below issue.
docker run -p 8080:80 -v $volume:/data --runtime=habana -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi --sharded true --num-shard 8 --model-id $model
2024-02-07T08:46:59.875416Z INFO text_generation_launcher: Args { model_id: "/data/Local_model/RTL_model_50e/", revision: None, validation_workers: 2, sharded: Some(true), num_shard: Some(8), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "cdb285643c82", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2024-02-07T08:46:59.875446Z INFO text_generation_launcher: Sharding model on 8 processes
2024-02-07T08:46:59.875559Z INFO download: text_generation_launcher: Starting download process.
2024-02-07T08:47:02.212617Z ERROR download: text_generation_launcher: Download encountered an error: /usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py:252: UserWarning: Device capability of hccl unspecified, assuming cpu and cuda. Please specify it via the devices argument of register_backend.
warnings.warn(
Traceback (most recent call last):

File "/usr/local/bin/text-generation-server", line 8, in
sys.exit(app())

File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 178, in download_weights
model_id, revision, trust_remote_code=trust_remote_code

NameError: name 'trust_remote_code' is not defined

Error: DownloadError

Also tried with option --trust-remote-code but then i got below issue
2024-02-07T08:50:32.665803Z ERROR download: text_generation_launcher: Download encountered an error: Usage: text-generation-server download-weights [OPTIONS] MODEL_ID
Try 'text-generation-server download-weights --help' for help.

Error: No such option: --trust-remote-code

Error: DownloadError

It seems to be a bug indeed. Will propose a fix.

@avinashkarani Can you try it using the branch called fix_trust_remote_code please?

@regisss , Tested new branch observed two issues

  1. If we run with PEFT model (with adapters config present), its peft version seems to be not compatible
    Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:12<00:00, 6.48s/it]
    Traceback (most recent call last):
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
    main(args)
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
    server.serve(
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 213, in serve
    asyncio.run(serve_inner(model_id, revision, dtype, sharded))
    File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
    File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 177, in serve_inner
    model = get_model(model_id, revision=revision, dtype=data_type)
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
    return CausalLM(model_id, revision, dtype)
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 512, in init
    model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, **model_kwargs)
    File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
    return model_class.from_pretrained(
    File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3380, in from_pretrained
    model.load_adapter(
    File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 137, in load_adapter
    check_peft_version(min_version=MIN_PEFT_VERSION)
    File "/usr/local/lib/python3.10/dist-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
    raise ValueError(
    ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0 rank=0

@regisss 2) Issue if I proved model with adaptors already merged, Code still defaults to PEFT model and failing with adapters config not found issue

is_local_model = (Path(model_id).exists() and Path(model_id).is_dir()) or os.getenv(
"WEIGHTS_CACHE_OVERRIDE", None
) is not None

if not is_local_model:
    # Try to download weights from the hub
    try:
        filenames = utils.weight_hub_files(model_id, revision, extension)
        utils.download_weights(filenames, model_id, revision)
        # Successfully downloaded weights
        return

    # No weights found on the hub with this extension
    except utils.EntryNotFoundError as e:
        # Check if we want to automatically convert to safetensors or if we can use .bin weights instead
        if not extension == ".safetensors" or not auto_convert:
            raise e

else:
    # Try to load as a local PEFT model
    try:
        utils.download_and_unload_peft(
            model_id, revision, trust_remote_code=trust_remote_code
        )
        utils.weight_files(model_id, revision, extension)
        return
    except (utils.LocalEntryNotFoundError, utils.EntryNotFoundError):
        pass

# Try to see if there are local pytorch weights
try:
    # Get weights for a local model, a hub cached model and inside the WEIGHTS_CACHE_OVERRIDE
    local_pt_files = utils.weight_files(model_id, revision, ".bin")

Above code in cli.py is first checking for PEFT model and failing in the else condition

@avinashkarani Currently I tried to build the docker image using the v1.2-release branch but somehow got the dependencies conflict. Did you got this issue as well? Thank you

Have tried loosen the package with pip install -U --no-deps, but still got the issue

323.2 The conflict is caused by:
323.2     text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2     tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2     diffusers 0.26.3 depends on huggingface-hub>=0.20.2
323.2     text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2     tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2     diffusers 0.26.2 depends on huggingface-hub>=0.20.2
323.2     text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2     tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2     diffusers 0.26.1 depends on huggingface-hub>=0.20.2
323.2     text-generation-server 1.2.0 depends on huggingface-hub<0.17.0 and >=0.16.4
323.2     tokenizers 0.14.1 depends on huggingface_hub<0.18 and >=0.16.4
323.2     diffusers 0.26.0 depends on huggingface-hub>=0.20.2

@muhammad-asn Can you try this PR and let me know if that works for you?

@regisss , Tested new branch observed two issues

  1. If we run with PEFT model (with adapters config present), its peft version seems to be not compatible
    Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2/2 [00:12<00:00, 6.48s/it]
    Traceback (most recent call last):
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 29, in
    main(args)
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/tgi_service.py", line 16, in main
    server.serve(
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 213, in serve
    asyncio.run(serve_inner(model_id, revision, dtype, sharded))
    File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
    File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 177, in serve_inner
    model = get_model(model_id, revision=revision, dtype=data_type)
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/init.py", line 33, in get_model
    return CausalLM(model_id, revision, dtype)
    File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/causal_lm.py", line 512, in init
    model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, **model_kwargs)
    File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
    return model_class.from_pretrained(
    File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3380, in from_pretrained
    model.load_adapter(
    File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 137, in load_adapter
    check_peft_version(min_version=MIN_PEFT_VERSION)
    File "/usr/local/lib/python3.10/dist-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
    raise ValueError(
    ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0 rank=0

@avinashkarani I think the issue here is that you're using a checkpoint that was created with an old version of PEFT. Could you try with a more recent checkpoint?

@regisss 2) Issue if I proved model with adaptors already merged, Code still defaults to PEFT model and failing with adapters config not found issue

is_local_model = (Path(model_id).exists() and Path(model_id).is_dir()) or os.getenv( "WEIGHTS_CACHE_OVERRIDE", None ) is not None

if not is_local_model:
    # Try to download weights from the hub
    try:
        filenames = utils.weight_hub_files(model_id, revision, extension)
        utils.download_weights(filenames, model_id, revision)
        # Successfully downloaded weights
        return

    # No weights found on the hub with this extension
    except utils.EntryNotFoundError as e:
        # Check if we want to automatically convert to safetensors or if we can use .bin weights instead
        if not extension == ".safetensors" or not auto_convert:
            raise e

else:
    # Try to load as a local PEFT model
    try:
        utils.download_and_unload_peft(
            model_id, revision, trust_remote_code=trust_remote_code
        )
        utils.weight_files(model_id, revision, extension)
        return
    except (utils.LocalEntryNotFoundError, utils.EntryNotFoundError):
        pass

# Try to see if there are local pytorch weights
try:
    # Get weights for a local model, a hub cached model and inside the WEIGHTS_CACHE_OVERRIDE
    local_pt_files = utils.weight_files(model_id, revision, ".bin")

Above code in cli.py is first checking for PEFT model and failing in the else condition

What is the path to your local model? Are you sure the volume where your model is located is well linked to the Docker container?

@regisss let me check first. thank you for your fast response

@regisss #54 I attach the issue on seperate github issue to prevent wider context

@muhammad-asn is the problem still exists? Can we close this issue?

Yes you can close this issue, it is solved by recent PR

Thank you