Original code from FasterTransformer / TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM/tree/main/cpp/tensorrt_llm/kernels
Adapted to support a different quantization scheme.
Original code from FasterTransformer / TensorRT-LLM: https://github.com/NVIDIA/TensorRT-LLM/tree/main/cpp/tensorrt_llm/kernels
Adapted to support a different quantization scheme.