cli99/quant-matmul

C++Apache-2.0

Quantized matmul in CUDA, with a PyTorch interface

Original code from FasterTransformer / TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM/tree/main/cpp/tensorrt_llm/kernels

Adapted to support a different quantization scheme.