Implementation of the "BitNet: Scaling 1-bit Transformers for Large Language Models"
BitLinear = tensor -> layernorm -> Binarize -> abs max quantization
pip install bitnet
- Example of the BitLinear layer which is the main innovation of the paper!
import torch
from bitnet import BitLinear
# random inputs
x = torch.randn(10, 512)
#apply linear
layer = BitLinear(512)
#layer
y, dequant = layer(x)
#print
print(y, dequant)
MIT
@misc{2310.11453,
Author = {Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Huaijie Wang and Lingxiao Ma and Fan Yang and Ruiping Wang and Yi Wu and Furu Wei},
Title = {BitNet: Scaling 1-bit Transformers for Large Language Models},
Year = {2023},
Eprint = {arXiv:2310.11453},
}
- Fix transformer pass error issue
- Split up q, k, v in one line