Pinned Repositories
adaserve
alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
basic_verilog
Must-have verilog systemverilog modules
blog
blog
BLU-Net
cutlass
CUDA Templates for Linear Algebra Subroutines
FABNet
The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"
llm-mixed-q
mixed-precision quantization for LLMs
lqer
LQER: Low-Rank Quantization Error Reconstruction for LLMs
ChengZhang-98's Repositories
ChengZhang-98/llm-mixed-q
mixed-precision quantization for LLMs
ChengZhang-98/lqer
LQER: Low-Rank Quantization Error Reconstruction for LLMs
ChengZhang-98/adaserve
ChengZhang-98/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
ChengZhang-98/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
ChengZhang-98/basic_verilog
Must-have verilog systemverilog modules
ChengZhang-98/blog
blog
ChengZhang-98/BLU-Net
ChengZhang-98/cutlass
CUDA Templates for Linear Algebra Subroutines
ChengZhang-98/FABNet
The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"
ChengZhang-98/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
ChengZhang-98/paper_reading
A shared paper reading repository for people in the group
ChengZhang-98/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
ChengZhang-98/fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
ChengZhang-98/gemmini
Berkeley's Spatial Array Generator
ChengZhang-98/hw-lockin-emulation-cost
ChengZhang-98/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
ChengZhang-98/mase-docker
Dockerfile for the MASE container
ChengZhang-98/MX-for-FPGA
Implementation of Microscaling data formats in SystemVerilog.
ChengZhang-98/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
ChengZhang-98/Parallel-Computing-Cuda-C
CUDA Learning guide