Issues
- 0
Llama模型转完ONNX之后,输入可以是input_embeds吗
#30 opened by OswaldoBornemann - 0
Error in converting llama
#29 opened by purejomo - 0
llama => onnx => tensorrt
#28 opened by tp-nan - 1
一种改进next_token计算的方式
#27 opened by luchangli03 - 5
如何分段转换llama模型为onnx?
#26 opened by jxcomeon - 3
GPU Inference
#25 opened by tpoisonooo - 2
- 1
- 1
onnx模型推理
#21 opened by KaiyuHu2001 - 1
请问如何支持batch的推理?
#22 opened by VincentJYZhang - 0
transfer fp32 to fp16 error
#18 opened by Ted8000 - 0
7B onnx模型(float16) 占用显存超过32G
#19 opened by iamhere1 - 4
Inference with GPU took too much GPU RAM
#16 opened by DungMinhDao - 1
Alternative RWKV onnx converter
#17 opened by harrisonvanderbyl - 4
Inference super slow
#15 opened by SinanAkkoyun - 17
- 1
demo_llama.py: No module named public
#14 opened by SinanAkkoyun - 11
convert Onnx problem
#12 opened by xcxhy - 3
- 13
some questions about llama.onnx
#3 opened by dvc94ch