qhfan's Stars
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
dk-liang/Awesome-Visual-Transformer
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
hustvl/Vim
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
state-spaces/s4
Structured state space sequence models
MzeroMiko/VMamba
VMamba: Visual State Space Models,code is based on mamba
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Pointcept/Pointcept
Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)
facebookresearch/ConvNeXt-V2
Code release for ConvNeXt V2 model
czczup/ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
AlfredXiangWu/LightCNN
A Light CNN for Deep Face Representation with Noisy Labels, TIFS 2018
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
NVlabs/FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
datamllab/LongLM
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
OpenRobotLab/EmbodiedScan
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Vchitect/Vlogger
[CVPR2024] Make Your Dream A Vlog
ZiyaoLi/fast-kan
FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)
qhfan/RMT
(CVPR2024)RMT: Retentive Networks Meet Vision Transformer
luogen1996/LLaVA-HR
LLaVA-HR: High-Resolution Large Language-Vision Assistant
transformer-vq/transformer_vq
dyhBUPT/iKUN
[CVPR 2024] iKUN: Speak to Trackers without Retraining
duchenzhuang/FSQ-pytorch
A Pytorch Implementation of Finite Scalar Quantization
ssyang2020/ZeroSmooth
Haochen-Wang409/DropPos
[NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
leofansq/Reinforcement_Learning_Curling
基于强化学习(RL)的冰壶游戏实例; 梯度下降的Sarsa(lambda) + 非均匀径向基特征表示
russellllaputa/MIRL
[NeurIPS 2023] Masked Image Residual Learning for Scaling Deeper Vision Transformers
qhfan/SecViT
official code for "Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer"
qhfan/SSViT
official code for "Vision Transformer with Sparse Scan Prior"