Unsupported model type llava_next
Spycsh opened this issue · 5 comments
System Info
Use the ghcr.io/huggingface/tgi-gaudi:2.0.1 official docker image.
As shown in https://github.com/huggingface/tgi-gaudi/blob/habana-main/docs/source/supported_models.md?plain=1, llava-hf/llava-v1.6-mistral-7b-hf should be supported but it appears not to be supported.
2024-07-16T04:35:51.631996Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
Traceback (most recent call last):
File "/usr/local/bin/text-generation-server", line 8, in <module>
sys.exit(app())
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/cli.py", line 137, in serve
server.serve(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 223, in serve
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/server.py", line 189, in serve_inner
model = get_model(
File "/usr/local/lib/python3.10/dist-packages/text_generation_server/models/__init__.py", line 109, in get_model
raise ValueError(f"Unsupported model type {model_type}")
ValueError: Unsupported model type llava_next
rank=0
2024-07-16T04:35:51.728344Z ERROR text_generation_launcher: Shard 0 failed to start
2024-07-16T04:35:51.728370Z INFO text_generation_launcher: Shutting down shards
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Both of following tries fail with the same "Model unsupported" error above
export model=llava-hf/llava-v1.6-mistral-7b-hf
export volume=$PWD/data
docker run -p 8080:80 -v $volume:/data --runtime=habana -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id $model --max-input-tokens 1024 --max-total-tokens 2048
git clone https://github.com/huggingface/tgi-gaudi.git
cd tgi-gaudi
docker build --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -t tgi_gaudi_llava .
docker run -p 8080:80 -v $volume:/data --runtime=habana -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host tgi_gaudi_llava --model-id $model --max-input-tokens 1024 --max-total-tokens 2048
Expected behavior
Since the README claims that llava_next is supported, users should be able to use that model
Seems the doc is simply the fork of TEI so the README does not fully apply to TEI Gaudi.
@Spycsh @kdamaszk Thanks for bringing up this issue. We have also experienced this as a blocker, and we have a need to run llava in TGI-Gaudi. Right now we have TGI-Gaudi running various LLMs in our product, but we have to use a workaround for llava, as we hit this issue. Given that Optimum Habana now officially supports llava_next, it would be great to get this in this TGI fork. See, for example: https://github.com/search?q=repo%3Ahuggingface%2Foptimum-habana+llava&type=pullrequests
Is there any active work ongoing to support llava in this TGI fork? If so, great. If not, maybe I could drum up some interest in a contribution (either within our company or with our partners).