HabanaAI/vllm-fork

A high-throughput and memory-efficient inference and serving engine for LLMs

PythonApache-2.0

Issues

[Bug]: llama 405B fp8 fails
#140 opened 7 months ago by endomorphosis
4
[Bug]: habana_main - Qwen2-7B fail on graph compile failed.
#232 opened 7 months ago by Zjq9409
2
[Bug]: Batched Multi-LoRA inference failure with random length dataset
#237 opened 7 months ago by tae-su-kim
3
[Bug]: Device type HPU is not supported for torch.Generator() api
#183 opened 8 months ago by sungwook-son
0
[Bug]: Using tensor parallel during offline inference causes the process to hang
#220 opened 7 months ago by xinsu626
0
[Bug]: Habana_main does not support DBRX and Arctic due to cuda hardcode
#216 opened 7 months ago by xuechendi
1
[Bug]: Unexpected decode graph compilation after preemption
#158 opened 8 months ago by tae-su-kim
1
[Doc]: Broken link in Gaudi-Installation Readme.
#165 opened 8 months ago by MohitIntel
0
[Performance]: context aware HpuRotaryEmbedding implementation
#166 opened 8 months ago by huijjj
0
[Bug]: Habana_NEXT failed on Lazy Model + Tensor Parallel - [Quick Fix Provided]
#173 opened 8 months ago by xuechendi
1
[Usage]: vllm can't run qwen 32B inference
#193 opened 8 months ago by kunger97
1
[Bug]: benchmark_latency.py cannot exit when using tp
#197 opened 8 months ago by JunxiChhen
2
[Bug]: Habana-Main failed with 'tools' + 'tool_choices' keyword while vllm upstream is working fine
#198 opened 7 months ago by xuechendi
0
[Feature]: Add Dockerfile.HPU
#199 opened 7 months ago by xuechendi
1
[Feature]: Compile warmup take too long
#201 opened 7 months ago by Zjq9409
1
[Usage]: tensor-parallel-size=2 second token latency is higher than tensor_parallel_size=1
#204 opened 7 months ago by Zjq9409
0
[Feature]: support pipeline parallelism inference in vllm
#205 opened 7 months ago by Zjq9409
0
[Usage]: The prompt bucket shape will not impact the performance
#209 opened 7 months ago by JunxiChhen
0
[Misc]: issue with loading weights from safetensors files
#211 opened 7 months ago by huijjj
1
[Bug]: habana_main - Chatglm3-6b fail on graph compile failed.
#213 opened 7 months ago by xuechendi
0
[Bug]: habana_main - gpt-j-6b fail on graph compile failed.
#214 opened 7 months ago by xuechendi
0
[Bug]: habana_main - gpt-neox-20b fail on graph compile failed.
#215 opened 7 months ago by xuechendi
0
[Bug]: collective nonSFG is not supported during hpu graph capturing
#192 opened 7 months ago by xinsu626
2
[Usage]: tensor-parallel-size=2 is very slow
#203 opened 7 months ago by Zjq9409
0
[Misc]: test issue
#127 opened 8 months ago by kzawora-intel
1