Issues
- 4
[Bug]: llama 405B fp8 fails
#140 opened by endomorphosis - 2
- 3
- 0
- 0
[Bug]: Using tensor parallel during offline inference causes the process to hang
#220 opened by xinsu626 - 1
- 1
- 0
[Doc]: Broken link in Gaudi-Installation Readme.
#165 opened by MohitIntel - 0
- 1
[Bug]: Habana_NEXT failed on Lazy Model + Tensor Parallel - [Quick Fix Provided]
#173 opened by xuechendi - 1
[Usage]: vllm can't run qwen 32B inference
#193 opened by kunger97 - 2
- 0
[Bug]: Habana-Main failed with 'tools' + 'tool_choices' keyword while vllm upstream is working fine
#198 opened by xuechendi - 1
[Feature]: Add Dockerfile.HPU
#199 opened by xuechendi - 1
[Feature]: Compile warmup take too long
#201 opened by Zjq9409 - 0
[Usage]: tensor-parallel-size=2 second token latency is higher than tensor_parallel_size=1
#204 opened by Zjq9409 - 0
- 0
- 1
- 0
- 0
- 0
- 2
- 0
[Usage]: tensor-parallel-size=2 is very slow
#203 opened by Zjq9409 - 1
[Misc]: test issue
#127 opened by kzawora-intel