huggingface/optimum-benchmark
A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
PythonApache-2.0
Issues
- 3
Onnxruntime Seq2Seq doesn't work
#180 opened by Knzaytsev - 0
- 9
More tests
#95 opened by IlyasMoutawwakil - 5
CUDA_VISIBLE_DEVICES aren't working
#176 opened by sashavor - 15
- 2
bnb.4bits error: "ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.uint8"
#175 opened by lifelongeeek - 0
regression testing api
#166 opened by IlyasMoutawwakil - 2
- 1
TensorRT-LLM - how to add support for new model?
#164 opened by pfk-beta - 9
- 1
Warning on loading quantized model
#158 opened by andxalex - 2
CLI tests of the cpu training benchmark with pytorch use the gpu if its available
#159 opened by aliabdelkader - 2
Is the `test` data generated by random token?
#153 opened by rui-ren - 5
Moving model to one device
#148 opened by pfk-beta - 9
Trt llm surport question
#133 opened by lemon-little - 1
- 5
(question)When I use the memory tracking feature on the GPU, I find that my VRAM is reported as 0. Is this normal, and what might be causing it?
#136 opened by WCSY-YG - 1
- 3
How to set trt llm backend parameters
#138 opened by Yuchen-Cao - 4
- 7
- 4
- 3
- 3
- 1
How can I test my local model?
#119 opened by smile2game - 3
- 7
Testing Qwen-7B. >>> AttributeError: 'NoneType' object has no attribute 'to_dict'
#120 opened by smile2game - 1
- 1
- 1
Remove `cuda` synchronizations
#121 opened by IlyasMoutawwakil - 0
May I ask if there is any method to call a gguf format model and test it?Thanks!
#115 opened by Confetti-lxy - 1
Question about your latency graph
#112 opened by dzenilee - 4
what can I do when I have ConnectionError Error ; And I want to use my local llama weight ?
#104 opened by cason0126 - 0
Timm support
#52 opened by IlyasMoutawwakil - 1
CPU cores isolation/targetting checks
#40 opened by IlyasMoutawwakil - 4
- 1
How to evaluate a model that already exists locally and hasn't been uploaded yet, "model=?"
#102 opened by WCSY-YG - 3
- 1
- 1
Need a detailed definition on forward latency
#101 opened by leocnj - 5
Evaluators for specific tasks
#34 opened by IlyasMoutawwakil - 4
py3nvml measures reserved and not used memory
#31 opened by fxmarty - 1
TP and DP support for inference
#86 opened by IlyasMoutawwakil - 1
RuntimeError: microsoft/deberta-large
#65 opened by karthickai - 4
CUDA_VISIBLE_DEVICES is not captured by torch
#27 opened by fxmarty - 0
TGI support
#49 opened by IlyasMoutawwakil - 0
- 6
- 3
Simulate GPTQ quantization
#44 opened by IlyasMoutawwakil - 3
DDP throughput
#29 opened by IlyasMoutawwakil