Training without full precision

Question

Training without full precision

Closed this issue 5 months ago · 1 comments

Is it possible to train in quantized mode without full precision? Could maybe use 2-bit quant to hold the ternary values? Thanks

Answer 1 · 2024-04-08T10:07:23.000Z

During training, when calling torch train methods, the BitLinear class is called by applying a surclass of $nn.module$ by registering the weights as buffers instead of being instantiated as parameters; this is necessary because $int8$ cannot be used with torch $nn.parameter$