Yonghao-Tan's Stars
usyd-fsalab/fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
xijiu9/Train_Transformers_with_INT4
nbasyl/LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
maestro-project/maestro
An analytical cost model evaluating DNN mappings (dataflows and tiling).
kotarot/rectangle-packing-solver
A solver to find a solution of the 2D rectangle packing problem by simulated annealing (SA) optimization.
sangyc10/CUDA-code
GATECH-EIC/ShiftAddViT
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
GeoMSK/FiducciaMattheyses
Bruces1998/FM_algorithm
facebookresearch/deit
Official DeiT repository
google-research/albert
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
PingchengDong/GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
state-spaces/mamba
Mamba SSM architecture
kyegomez/VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
zslwyuan/Basic-SIMD-Processor-Verilog-Tutorial
Implementation of a simple SIMD processor in Verilog, core of which is a 16-bit SIMD ALU. 2's compliment calculations are implemented in this ALU. The ALU operation will take two clocks. The first clock cycle will be used to load values into the registers. The second will be for performing the operations. 6-bit opcodes are used to select the functions. The instruction code, including the opcode, will be 18-bit.
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
graphdeco-inria/gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
princeton-vl/DROID-SLAM
cvg/nice-slam
[CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
yenchenlin/nerf-pytorch
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
dlaptev/RobustPCA
Robust PCA implementation and examples (Matlab)
Buck008/Transformer-Accelerator-Based-on-FPGA
You can run it on pynq z1. The repository contains the relevant Verilog code, Vivado configuration and C code for sdk testing. The size of the systolic array can be changed, now it is 16X16.
chihhuiho/yoro
VainF/DeepLabV3Plus-Pytorch
Pretrained DeepLabv3 and DeepLabv3+ for Pascal VOC & Cityscapes
chiragsakhuja/spotlight
GATECH-EIC/ViTCoD
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
bmartini/zynq-axis
Hardware, Linux Driver and Library for the Zynq AXI DMA interface