qhfan

qhfan's Stars

FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Jupyter Notebook6.5k 121 107430
dk-liang/Awesome-Visual-Transformer
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
3.4k 102 41400
hustvl/Vim
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Language:Python3.1k 32 119208
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Language:Python2.7k 24 28195
state-spaces/s4
Structured state space sequence models
Language:Jupyter Notebook2.5k 52 139301
MzeroMiko/VMamba
VMamba: Visual State Space Models，code is based on mamba
Language:Python2.3k 17 342152
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
Language:Python2.1k 8 24535
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language:Python1.8k 23 71118
Pointcept/Pointcept
Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)
Language:Python1.8k 20 352193
facebookresearch/ConvNeXt-V2
Code release for ConvNeXt V2 model
Language:Python1.6k 7 75120
czczup/ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
Language:Python1.3k 18 187141
AlfredXiangWu/LightCNN
A Light CNN for Deep Face Representation with Noisy Labels, TIFS 2018
Language:Python1k 27 80166
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Language:Python959 13 2056
NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
Language:Python806 18 5062
datamllab/LongLM
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Language:Python631 10 3861
OpenRobotLab/EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Language:Python518 7 7238
shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Language:Python499 8 6431
Vchitect/Vlogger
[CVPR2024] Make Your Dream A Vlog
Language:Python418 10 1543
ZiyaoLi/fast-kan
FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)
Language:Jupyter Notebook373 2 1549
qhfan/RMT
(CVPR2024)RMT: Retentive Networks Meet Vision Transformer
Language:Python303 14 3823
luogen1996/LLaVA-HR
LLaVA-HR: High-Resolution Large Language-Vision Assistant
Language:Python220 3 1511
transformer-vq/transformer_vq
Language:Python184 3 212
dyhBUPT/iKUN
[CVPR 2024] iKUN: Speak to Trackers without Retraining
Language:Python112 1 352
duchenzhuang/FSQ-pytorch
A Pytorch Implementation of Finite Scalar Quantization
Language:Python95 5 44
ssyang2020/ZeroSmooth
65 8 10
Haochen-Wang409/DropPos
[NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Language:Python60 1 84
leofansq/Reinforcement_Learning_Curling
基于强化学习(RL)的冰壶游戏实例; 梯度下降的Sarsa(lambda) + 非均匀径向基特征表示
Language:Python18 1 11
russellllaputa/MIRL
[NeurIPS 2023] Masked Image Residual Learning for Scaling Deeper Vision Transformers
Language:Python18 3 03
qhfan/SecViT
official code for "Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer"
Language:Python10 2 40
qhfan/SSViT
official code for "Vision Transformer with Sparse Scan Prior"
Language:Python7 2 01