huggingface/tgi-gaudi

Unsupported model type llava_next

Spycsh opened this issue · 5 comments

System Info

Use the ghcr.io/huggingface/tgi-gaudi:2.0.1 official docker image.

As shown in https://github.com/huggingface/tgi-gaudi/blob/habana-main/docs/source/supported_models.md?plain=1, llava-hf/llava-v1.6-mistral-7b-hf should be supported but it appears not to be supported.

2024-07-16T04:35:51.631996Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
Traceback (most recent call last):

  File "/usr/local/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 137, in serve
    server.serve(

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
    asyncio.run(

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
    model = get_model(

  File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 109, in get_model
    raise ValueError(f"Unsupported model type {model_type}")

ValueError: Unsupported model type llava_next
 rank=0
2024-07-16T04:35:51.728344Z ERROR text_generation_launcher: Shard 0 failed to start
2024-07-16T04:35:51.728370Z  INFO text_generation_launcher: Shutting down shards

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Both of following tries fail with the same "Model unsupported" error above

export model=llava-hf/llava-v1.6-mistral-7b-hf
export volume=$PWD/data

docker run -p 8080:80 -v $volume:/data --runtime=habana -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id $model --max-input-tokens 1024 --max-total-tokens 2048
git clone https://github.com/huggingface/tgi-gaudi.git
cd tgi-gaudi
docker build --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -t tgi_gaudi_llava .

docker run -p 8080:80 -v $volume:/data --runtime=habana -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi_llava --model-id $model --max-input-tokens 1024 --max-total-tokens 2048

Expected behavior

Since the README claims that llava_next is supported, users should be able to use that model

Seems the doc is simply the fork of TEI so the README does not fully apply to TEI Gaudi.

@Spycsh you are right, configurations that are officially supported in this fork are pointed out in main README

@Spycsh @kdamaszk Thanks for bringing up this issue. We have also experienced this as a blocker, and we have a need to run llava in TGI-Gaudi. Right now we have TGI-Gaudi running various LLMs in our product, but we have to use a workaround for llava, as we hit this issue. Given that Optimum Habana now officially supports llava_next, it would be great to get this in this TGI fork. See, for example: https://github.com/search?q=repo%3Ahuggingface%2Foptimum-habana+llava&type=pullrequests

Is there any active work ongoing to support llava in this TGI fork? If so, great. If not, maybe I could drum up some interest in a contribution (either within our company or with our partners).

Ah, I'm just seeing this: #193. Not sure how I missed that.

Llava-next support was added in #187, closing this issue.