yanglixiaoshen
PhD student, EE, BUAA; Member of MC2 Lab; Working on CV, MM and touching fish.
Beihang UniversityBeijing China
yanglixiaoshen's Stars
facebookresearch/Mask2Former
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
Wu-ZJ/DSGNN
dragonlee258079/Saliency-Ranking
Code release for the TPAMI 2021 paper "Instance-Level Relative Saliency Ranking with Graph Reasoning" by Nian Liu, Long Li, Wangbo Zhao, Junwei Han, and Ling Shao.
MinglangQiao/awesome-salient-object-ranking
A curated list of awesome resources for salient object ranking (SOR)
Luo-Z13/SkySenseGPT
A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
milvus-io/milvus
A cloud-native vector database, storage for next generation AI applications
hukenovs/easyportrait
EasyPortrait - Face Parsing and Portrait Segmentation Dataset
suoych/KEDs
Implementation of the paper Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval (CVPR 2024)
thuyngch/Human-Segmentation-PyTorch
Human segmentation models, training/inference code, and trained weights, implemented in PyTorch
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
guanhuankang/ECCV24PoseSOR
ECCV24-PoseSOR: Human Pose Can Guide Our Attention
EricDengbowen/QAGNet
Official repository for CVPR 2024 paper "Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks".
guanhuankang/SeqRank
Paper "SeqRank: Sequential Ranking of Salient Objects" is accepted in AAAI-24.
franciszzj/VLPrompt
VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
ChocoWu/Awesome-Scene-Graph-Generation
This is a repository for listing papers on scene graph generation and application.
Q-Future/Q-Align
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
NeeluMadan/ViFM_Survey
Foundation Models for Video Understanding: A Survey
NExT-ChatV/NExT-Chat
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
SkyworkAI/Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
OpenGVLab/unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Rubics-Xuan/MRES
This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation", accepted by CVPR 2024.
feiyanhu/tinyHD
yuhangzang/ContextDET
Contextual Object Detection with Multimodal Large Language Models
microsoft/LLaVA-Med
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
codec2021/video_codec_learn
关于视频编解码学习资料