Is there a particular reason to not support batch processing?
ViktorooReps opened this issue · 1 comments
ViktorooReps commented
Every 2.0s: nvidia-smi d26b4303cee2: Tue Jul 16 21:03:36 2024
Tue Jul 16 21:03:36 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-40GB Off | 00000000:00:04.0 Off | 0 |
| N/A 43C P0 149W / 400W | 20439MiB / 40960MiB | 52% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
I run LLaMA 2 on A100 GPU (on Google Colab, so maybe the environment is not perfect) and get 50% utilization.
I can try to implement batching myself, but need some advice on what to avoid and what not to break.
ViktorooReps commented
Batching is now supported for HF wrapper