Yonghao-Tan

Yonghao-Tan's Stars

usyd-fsalab/fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Language:Cuda14812
xijiu9/Train_Transformers_with_INT4
Language:Python1274
nbasyl/LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
Language:Python1457
maestro-project/maestro
An analytical cost model evaluating DNN mappings (dataflows and tiling).
Language:MATLAB17055
kotarot/rectangle-packing-solver
A solver to find a solution of the 2D rectangle packing problem by simulated annealing (SA) optimization.
Language:Python8017
sangyc10/CUDA-code
Language:Cuda50061
GATECH-EIC/ShiftAddViT
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Language:Python27
GeoMSK/FiducciaMattheyses
Language:Python15
Bruces1998/FM_algorithm
Language:Python1
facebookresearch/deit
Official DeiT repository
Language:Python3.9k547
google-research/albert
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Language:Python3.2k571
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
84448
PingchengDong/GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
Language:Python9
state-spaces/mamba
Mamba SSM architecture
Language:Python11.4k923
kyegomez/VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
Language:Python29614
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Language:Python6.3k480
zslwyuan/Basic-SIMD-Processor-Verilog-Tutorial
Implementation of a simple SIMD processor in Verilog, core of which is a 16-bit SIMD ALU. 2's compliment calculations are implemented in this ALU. The ALU operation will take two clocks. The first clock cycle will be used to load values into the registers. The second will be for performing the operations. 6-bit opcodes are used to select the functions. The instruction code, including the opcode, will be 18-bit.
Language:Verilog11131
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Language:Python2.6k235
graphdeco-inria/gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Language:Python12.4k1.5k
princeton-vl/DROID-SLAM
Language:Python1.7k273
cvg/nice-slam
[CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
Language:Python1.4k191
yenchenlin/nerf-pytorch
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
Language:Python5.2k1k
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Language:Python29.7k6.3k
dlaptev/RobustPCA
Robust PCA implementation and examples (Matlab)
Language:Matlab19074
Buck008/Transformer-Accelerator-Based-on-FPGA
You can run it on pynq z1. The repository contains the relevant Verilog code, Vivado configuration and C code for sdk testing. The size of the systolic array can be changed, now it is 16X16.
Language:Verilog803
chihhuiho/yoro
Language:Python16
VainF/DeepLabV3Plus-Pytorch
Pretrained DeepLabv3 and DeepLabv3+ for Pascal VOC & Cityscapes
Language:Python1.8k424
chiragsakhuja/spotlight
Language:MATLAB112
GATECH-EIC/ViTCoD
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Language:Python839
bmartini/zynq-axis
Hardware, Linux Driver and Library for the Zynq AXI DMA interface
Language:Verilog9638