AutoGPTQ/AutoGPTQ

The Path to v1.0.0

PanQiWei opened this issue · 2 comments

Hi everyone, long time no see! Start from this week, I will use about 4 weeks to gradually push AutoGPTQ to v1.0.0, in the mean time, there will be 2~3 minor version released as optimization or feature preview so that you can experience those updates as soon as they are finished and I can hear more community voices and get more feedbacks.

My vision is at the time of v1.0.0 is released, AutoGPTQ can serve as an automatic, extendable and flexible quantization backend for all language models that are written by Pytorch.

I open this issue to list all the things will be done (optimizations, new features, bug fixes, etc) and record the development progress.(so contents below will be updated frequently)

Feel free to comment in the thread to give your opinions and suggestions!

Optimizations

  • refactor the code framework for the future extensions while maintain the important interfaces.
    • separate quantization logic as a stand alone module and serve as mixin.
    • design automatic structure recognize strategy to better support different models (hope can even support multi-modal and diffusion models).
  • speed up model packing after quantization.
  • support kernel fusion to more models to futher speed up inference.

New Features

  • model sharping: split model checkpoint into multiple files and load from multiple files. #364
  • tensor parallelism for all kinds of QuantLinear that are supported by AutoGPTQ.
  • CLI: run common commands such as quantization and benchmark directly.

Bug Fixes

Hi @PanQiWei, Any updates regarding version 1.0.0?

@PanQiWei Can you rejoin @fxmarty and be more active in code reviews? Feels like the project needs at least 2 active maintainers to keep it up to speed and not overload any single person.