pdh930105's Stars
ImagineAILab/ai-by-hand-excel
hemingkx/SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Haiyang-W/GiT
[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
gstoica27/ZipIt
A framework for merging models solving different tasks with different initializations into one multi-task model without any additional training
trevorpogue/algebraic-nnhw
Algebraic enhancements for deep learning accelerator architectures
feizc/DiT-MoE
Scaling Diffusion Transformers with Mixture of Experts
jxiw/MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
CASE-Lab-UMD/LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
ShoufaChen/Awesome-Diffusion-Transformers
https://www.shoufachen.com/Awesome-Diffusion-Transformers/
ChenMnZ/PrefixQuant
An algorithm for static activation quantization of LLMs
thunlp/Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
ChangyuanWang17/QVLM
[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.
NVlabs/COAT
smart-lty/ParallelSpeculativeDecoding
The official code for paper "parallel speculative decoding with adaptive draft length."
Zhaoshixin-sky/CIM-MLC
[ASPLOS 2024] CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators
Intelligent-Computing-Lab-Yale/TesseraQ
PiggyJerry/DC-Net
The code for paper: "DC-Net: Divide-and-Conquer for Salient Object Detection"
thunlp/Seq1F1B
Sequence-level 1F1B schedule for LLMs.
ebby-s/MX-for-FPGA
Implementation of Microscaling data formats in SystemVerilog.
snu-comparch/Tender
Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)
BidyutSaha/TinyTNAS
TinyTNAS is a hardware-aware, multi-objective, time-bound Neural Architecture Search (NAS) tool designed for TinyML time series classification. Unlike GPU-based NAS methods, it runs efficiently on CPUs.
Cheliosoops/BitQ
abdelfattah-lab/shadow_llm
lynn2089/SmartLite
b-faye/OneEncoder
ershang2/SlowTrack
pingxue-hfut/DWR
Fast and Accurate Binary Neural Networks based on Depth-Width Reshaping
yamilvindas/gdec
dongwonjo/BinaryMoS
naufalso/adversarial-manhole
Official code for "Adversarial Manholes: Challenging Monocular Depth Estimation and Semantic Segmentation with Physical Attacks"