ChengZhang-98

PhD student at Imperial College London.

Imperial College LondonLondon

Pinned Repositories

adaserve
Language:Python00
alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language:Jupyter Notebook00
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python00
basic_verilog
Must-have verilog systemverilog modules
Language:Verilog00
blog
blog
00
BLU-Net
Language:Python00
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++00
FABNet
The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"
Language:Verilog00
llm-mixed-q
mixed-precision quantization for LLMs
Language:Python14 2 10
lqer
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Language:Python40

ChengZhang-98's Repositories

ChengZhang-98/llm-mixed-q
mixed-precision quantization for LLMs
Language:Python14 2 10
ChengZhang-98/lqer
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Language:Python40
ChengZhang-98/adaserve
Language:Python00
ChengZhang-98/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Language:Jupyter Notebook00
ChengZhang-98/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python00
ChengZhang-98/basic_verilog
Must-have verilog systemverilog modules
Language:Verilog00
ChengZhang-98/blog
blog
00
ChengZhang-98/BLU-Net
Language:Python00
ChengZhang-98/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++00
ChengZhang-98/FABNet
The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"
Language:Verilog00
ChengZhang-98/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Language:Python00
ChengZhang-98/paper_reading
A shared paper reading repository for people in the group
00
ChengZhang-98/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
Language:Cuda00
ChengZhang-98/fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
ChengZhang-98/gemmini
Berkeley's Spatial Array Generator
ChengZhang-98/hw-lockin-emulation-cost
Language:Python
ChengZhang-98/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
ChengZhang-98/mase-docker
Dockerfile for the MASE container
Language:Shell
ChengZhang-98/MX-for-FPGA
Implementation of Microscaling data formats in SystemVerilog.
ChengZhang-98/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Language:Python
ChengZhang-98/Parallel-Computing-Cuda-C
CUDA Learning guide

ChengZhang-98

Pinned Repositories

adaserve

alpaca_eval

AutoAWQ

basic_verilog

blog

BLU-Net

cutlass

FABNet

llm-mixed-q

lqer

ChengZhang-98's Repositories

ChengZhang-98/llm-mixed-q

ChengZhang-98/lqer

ChengZhang-98/adaserve

ChengZhang-98/alpaca_eval

ChengZhang-98/AutoAWQ

ChengZhang-98/basic_verilog

ChengZhang-98/blog

ChengZhang-98/BLU-Net

ChengZhang-98/cutlass

ChengZhang-98/FABNet

ChengZhang-98/lm-evaluation-harness

ChengZhang-98/paper_reading

ChengZhang-98/SGEMM_CUDA

ChengZhang-98/fp6_llm

ChengZhang-98/gemmini

ChengZhang-98/hw-lockin-emulation-cost

ChengZhang-98/marlin

ChengZhang-98/mase-docker

ChengZhang-98/MX-for-FPGA

ChengZhang-98/opencompass

ChengZhang-98/Parallel-Computing-Cuda-C