This repo include scripts for measuring emulation cost in the HW lockin paper.
- Python 3.11
- Install dependecies using
./requirements.txt
The following two scripts measures the latency and throughput of quantization-aware/pruning-aware GEMMs and LLM inference.
- Quantization emulation cost:
python profile-q-int.py
- Pruning emulation cost
python profile-p.py