Yonghao-Tan

Yonghao-Tan's Stars

google-research/albert
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Language:Python3.2k568
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
1.1k64
PingchengDong/GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
Language:Python10
state-spaces/mamba
Mamba SSM architecture
Language:Python12.7k1.1k
kyegomez/VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
Language:Python36319
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Language:Python6.8k523
zslwyuan/Basic-SIMD-Processor-Verilog-Tutorial
Implementation of a simple SIMD processor in Verilog, core of which is a 16-bit SIMD ALU. 2's compliment calculations are implemented in this ALU. The ALU operation will take two clocks. The first clock cycle will be used to load values into the registers. The second will be for performing the operations. 6-bit opcodes are used to select the functions. The instruction code, including the opcode, will be 18-bit.
Language:Verilog12033
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Language:Python2.7k243
graphdeco-inria/gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
Language:Python13.8k1.8k
princeton-vl/DROID-SLAM
Language:Python1.8k295
cvg/nice-slam
[CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
Language:Python1.4k193
yenchenlin/nerf-pytorch
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
Language:Python5.4k1.1k
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Language:Python30.2k6.4k
dlaptev/RobustPCA
Robust PCA implementation and examples (Matlab)
Language:Matlab19573
Buck008/Transformer-Accelerator-Based-on-FPGA
You can run it on pynq z1. The repository contains the relevant Verilog code, Vivado configuration and C code for sdk testing. The size of the systolic array can be changed, now it is 16X16.
Language:Verilog988
chihhuiho/yoro
Language:Python16
VainF/DeepLabV3Plus-Pytorch
Pretrained DeepLabv3 and DeepLabv3+ for Pascal VOC & Cityscapes
Language:Python1.9k437
chiragsakhuja/spotlight
Language:MATLAB112
GATECH-EIC/ViTCoD
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Language:Python9010
bmartini/zynq-axis
Hardware, Linux Driver and Library for the Zynq AXI DMA interface
Language:Verilog9838
kssteven418/I-BERT
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
Language:Python22332
Reconfigurable-Computing/Hardware-friendly-PACT-Quantization
Language:Python4
666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Language:Python2.2k478
zhutmost/lsq-net
Unofficial implementation of LSQ-Net, a neural network quantization framework
Language:Python27140
microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Language:Python14k1.8k
WangXuan95/Xilinx-FPGA-PCIe-XDMA-Tutorial
Xilinx FPGA PCIe 保姆级教程 ——基于 PCIe XDMA IP核
Language:Batchfile42085
yijingru/Vertebra-Landmark-Detection
[ISBI 2020] Vertebra-Focused Landmark Detection for Scoliosis Assessment
Language:Python9226
gpgpu-sim/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Language:C++1.1k505
jbush001/NyuziProcessor
GPGPU microprocessor architecture
Language:C2k351
OpenXiangShan/XiangShan
Open-source high-performance RISC-V processor
Language:Scala4.7k646