Does this include the GPTQ quantization tricks?
vedantroy opened this issue · 0 comments
vedantroy commented
The GPTQ readme has the following:
which demonstrates two new tricks:--act-order (quantizing columns in order of decreasing activation size) and --true-sequential (performing sequential quantization even within a single Transformer block). Those fix GPTQ's strangely bad performance on the 7B model (from 7.15 to 6.09 Wiki2 PPL) and lead to slight improvements on most models/settings in general.
Does this repository use these tricks?