Pinned Repositories
GPTQ-triton
GPTQ inference Triton kernel
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
GPTQ-triton
GPTQ inference Triton kernel
resposity
xv6
wanghz18's Repositories
wanghz18/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
wanghz18/GPTQ-triton
GPTQ inference Triton kernel
wanghz18/resposity
wanghz18/xv6