peilin-chen
Ph.D. student at the University of Virginia. His research interests are Digital/Mixed-signal IC Design, AI Chips, and Computer Architecture.
Charlottesville, VA, USA
Pinned Repositories
3d-photo-inpainting
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting
ardupilot
ArduPlane, ArduCopter, ArduRover source
cnn_accelerator
【入门项目】基于PYNQ-Z2实现手写数字识别卷积神经网络硬件加速器
COP-820
Eyeriss Hardware Accelerator for Machine Learning
CPU
单周期 8指令 MIPS32CPU
FaceRecognition-tensorflow
基于TensorFlow训练的人脸识别神经网络
fast-depth
ICRA 2019 "FastDepth: Fast Monocular Depth Estimation on Embedded Systems"
gemmini
Berkeley's Spatial Array Generator
hls_for_cnn_mnist
【入门项目】这个仓库是用hls来实现手写数字识别CNN硬件(xilinx fpga)加速的代码
Zhulong-RISCV-CPU
CPU Design Based on RISCV ISA
peilin-chen's Repositories
peilin-chen/Zhulong-RISCV-CPU
CPU Design Based on RISCV ISA
peilin-chen/gemmini
Berkeley's Spatial Array Generator
peilin-chen/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
peilin-chen/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
peilin-chen/AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
peilin-chen/basejump_stl
BaseJump STL: A Standard Template Library for SystemVerilog
peilin-chen/flash-attention
Fast and memory-efficient exact attention
peilin-chen/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
peilin-chen/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
peilin-chen/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
peilin-chen/hardware-accelerator-for-LLM
Major project - kannada LLM for farmers
peilin-chen/KIVI
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
peilin-chen/KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
peilin-chen/llama
Inference code for Llama models
peilin-chen/llama3
The official Meta Llama 3 GitHub site
peilin-chen/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
peilin-chen/LLMsPracticalGuide
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
peilin-chen/metaseq
Repo for external large-scale work
peilin-chen/ml-retreat
Machine Learning Journal for Intermediate to Advanced Topics.
peilin-chen/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
peilin-chen/peilin-chen.github.io
peilin-chen/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
peilin-chen/siliwiz
Silicon Layout Wizard
peilin-chen/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
peilin-chen/spatten-llm
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
peilin-chen/TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
peilin-chen/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
peilin-chen/ventus-gpgpu
GPGPU processor supporting RISCV-V extension, developed with Chisel HDL
peilin-chen/ventus-gpgpu-verilog
GPGPU supporting RISCV-V, developed with verilog HDL
peilin-chen/vortex