zhouyuan888888's Stars
ml-research/ledits_pp
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Ucas-HaoranWei/Vary-toy
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
DLYuanGod/TinyGPT-V
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Yuliang-Liu/Monkey
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
fangwei123456/spikingjelly
SpikingJelly is an open-source deep learning framework for Spiking Neural Network (SNN) based on PyTorch.
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
l0o0/jasminum
A Zotero add-on to retrive CNKI meta data. 一个简单的Zotero 插件,用于识别中文元数据
windingwind/zotero-plugin-template
A plugin template for Zotero.
windingwind/zotero-better-notes
Everything about note management. All in Zotero.
mit-han-lab/efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
SHI-Labs/NATTEN
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
XuezheMax/fairseq-apollo
FairSeq repo with Apollo optimizer
cvpr-org/author-kit
KwaiVGI/LivePortrait
Bring portraits to life!
LeapLabTHU/MLLA
Official repository of MLLA
hustvl/ViG
Event-AHU/Mamba_State_Space_Model_Paper_List
[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
NVlabs/MambaVision
Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
YoYo000/MVSNet
MVSNet (ECCV2018) & R-MVSNet (CVPR2019)
ewrfcas/MVSFormer
Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)
doubleZ0108/GeoMVSNet
[CVPR 23'] GeoMVSNet: Learning Multi-View Stereo with Geometry Perception
cvlab-columbia/zero123
Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
HengLan/CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
Audio-WestlakeU/ATST-SED
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
frednam93/FDY-SED
THU-LYJ-Lab/T3Bench
T3Bench: Benchmarking Current Progress in Text-to-3D Generation
aimagelab/multimodal-garment-designer
This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing". ICCV 2023