INT-FP-QSim is a simulator that supports flexible evaluation of large language models (LLMs) and Vision Transformers for different numerical precisions, formats (integer or floating point) and their combinations. Please see https://arxiv.org/abs/2307.03712 for further details on the simulator and the results obtained for different models. INT-FP-QSim is intended for research purposes only.
INT-FP-QSim requires PyTorch.
As there are multiple installation options for PyTorch, we advise to follow their installation directions.
We also advise to use the --index-url
option when installing PyTorch so the particular CUDA or CPU version is installed.
We have tested the repo using PyTorch 1.13.0, CUDA 11.6 or CPU versions.
This can be installed with:
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
INT-FP-QSim can be installed with pip install -e .
from within the directory of the cloned repo.
import torch
import torchvision
from int_fp_qsim.replace import replace_layers
from int_fp_qsim.formats import E4M3, BF16
model = torchvision.models.resnet50()
# Quantize inputs and weights to E4M3 and outputs in BF16
# Following function replaces layers of the model to support quantization
# Note that replace_layers is an in-place function
replace_layers(
model,
input_quantizer=E4M3,
weight_quantizer=E4M3,
output_quantizer=BF16
)
print(model) # You can see the quantizer objects attached to layers
# Continue to evaluation
inputs = torch.randn(1, 3, 225, 225)
model.eval()
with torch.no_grad():
model(inputs)
See requirements.txt
for a full list of dependencies, including dependencies for running the examples provided. Example scripts for performing evaluation with different models is provided in the examples
folder. See the corresponding *.sh
files for the full commands.
Licensed under Apache 2.0. Please see LICENSE
.
This is not an officially supported Lightmatter product.
If you find this repository useful, please consider giving it a star and citation:
Nair, Lakshmi, et al. INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers. arXiv, 7 July 2023. arXiv.org, https://doi.org/10.48550/arXiv.2307.03712.