Issues
- 2
GPTQ uint4 quantization broken
#207 opened by endomorphosis - 0
- 3
- 17
Integration of llama3.1 fixes
#197 opened by Feelas - 1
llama3.1-70B-instruct 422 error Template error: unknown test: test iterable is unknown (in <string>:99)
#218 opened by minmin-intel - 2
When running llama2 7b, inference some 2k length prompt concurrently will cause TGI service crash.
#216 opened by yao531441 - 5
Unsupported model type llava_next
#186 opened by Spycsh - 3
LlaVa support
#149 opened by JoeyTPChou - 1
example/run_generation.py fails with unexpected argument for TextGenerationStreamingResponse
#189 opened by gpapilion - 0
- 4
Misleading documentation
#174 opened by 12010486 - 0
- 3
https://github.com/huggingface/tgi-gaudi/pull/176 causes performance regression for benchmark
#184 opened by mandy-li - 3
ValueError: Unsupported model type t5
#172 opened by JunxiChhen - 1
low throughput while using TGI-Gaudi on bigcode/starcoderbase-3b on Gaudi2
#166 opened by vishnumadhu365 - 4
Integrate critical PR from TGI uptream
#155 opened by luoyiroy - 3
Clarification on past_key_values type for Starcoder
#116 opened by vidyasiv - 3
- 1
v2.0.0-release: 8 extra tokens appended to input tokens trigger in trigger the huggingface_hub.errors.ValidationError: Input validation error: `inputs` tokens + `max_new_tokens` must be <= 32512. Given: 32008 `inputs` tokens and 512 `max_new_tokens`. however no issue in v1.2.2-release
#146 opened by IT-Forrest - 4
- 1
How to use FP8 feature in TGI-gaudi
#95 opened by lvliang-intel - 4
update the base image from 1.14 to 1.15
#127 opened by yafshar - 6
HPUGraph destructor issue when installing dill
#130 opened by yafshar - 9
- 15
- 3
Docker build issue
#97 opened by akarX23 - 32
Issue running meta-llama/Llama-2-13b-chat-hf
#54 opened by muhammad-asn - 1
Cargo error clap
#47 opened by muhammad-asn