Yonghao-Tan's Stars
google-research/albert
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
PingchengDong/GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
state-spaces/mamba
Mamba SSM architecture
kyegomez/VisionMamba
Implementation of Vision Mamba from the paper: "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model" It's 2.8x faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on high-res images
LiheYoung/Depth-Anything
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
zslwyuan/Basic-SIMD-Processor-Verilog-Tutorial
Implementation of a simple SIMD processor in Verilog, core of which is a 16-bit SIMD ALU. 2's compliment calculations are implemented in this ALU. The ALU operation will take two clocks. The first clock cycle will be used to load values into the registers. The second will be for performing the operations. 6-bit opcodes are used to select the functions. The instruction code, including the opcode, will be 18-bit.
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
graphdeco-inria/gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
princeton-vl/DROID-SLAM
cvg/nice-slam
[CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM
yenchenlin/nerf-pytorch
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
dlaptev/RobustPCA
Robust PCA implementation and examples (Matlab)
Buck008/Transformer-Accelerator-Based-on-FPGA
You can run it on pynq z1. The repository contains the relevant Verilog code, Vivado configuration and C code for sdk testing. The size of the systolic array can be changed, now it is 16X16.
chihhuiho/yoro
VainF/DeepLabV3Plus-Pytorch
Pretrained DeepLabv3 and DeepLabv3+ for Pascal VOC & Cityscapes
chiragsakhuja/spotlight
GATECH-EIC/ViTCoD
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
bmartini/zynq-axis
Hardware, Linux Driver and Library for the Zynq AXI DMA interface
kssteven418/I-BERT
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
Reconfigurable-Computing/Hardware-friendly-PACT-Quantization
666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
zhutmost/lsq-net
Unofficial implementation of LSQ-Net, a neural network quantization framework
microsoft/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
WangXuan95/Xilinx-FPGA-PCIe-XDMA-Tutorial
Xilinx FPGA PCIe 保姆级教程 ——基于 PCIe XDMA IP核
yijingru/Vertebra-Landmark-Detection
[ISBI 2020] Vertebra-Focused Landmark Detection for Scoliosis Assessment
gpgpu-sim/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
jbush001/NyuziProcessor
GPGPU microprocessor architecture
OpenXiangShan/XiangShan
Open-source high-performance RISC-V processor