huangmozhi9527's Stars
yuweihao/MambaOut
MambaOut: Do We Really Need Mamba for Vision?
OpenDriveLab/DriveAGI
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System
magic-research/PLLaVA
Official repository for the paper PLLaVA
OpenDriveLab/OpenLane-V2
[NeurIPS 2023 Track Datasets and Benchmarks] OpenLane-V2: The First Perception and Reasoning Benchmark for Road Driving
rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
OpenDriveLab/PersFormer_3DLane
[ECCV 2022 Oral] Perspective Transformer on 3D Lane Detection
RenShuhuai-Andy/TimeChat
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
OpenDriveLab/TopoNet
Topology Reasoning for Scene Perception in Autonomous Driving
reka-ai/reka-vibe-eval
Multimodal language model benchmark, featuring challenging examples
HJYao00/DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
jpthu17/DiffusionRet
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
ByungKwanLee/Meteor
[NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to improve performance of numerous vision language performances for diverse capabilities.
liguopeng0923/UCVGL
[CVPR 2024🔥] Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
IMCCretrieval/ProST
Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral
IMCCretrieval/MomentDiff
MomentDiff: Generative Video Moment Retrieval from Random to Real--NeurIPS 2023
yangnianzu0515/MoleRec
The official implementation of our paper "MoleRec: Combinatorial Drug Recommendation with Substructure-Aware Molecular Representation Learning" (TheWebConf 2023).
whwu95/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
TencentARC/TVTS
Turning to Video for Transcript Sorting
mbzuai-oryx/CVRR-Evaluation-Suite
Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".
gimpong/MM23-MISSRec
The code for the paper "MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation" (ACM MM'23).
gengyuanmax/MeVTR
Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'
EricLee8/Multi-party-Dialogue-MRC
Codes and data for EMNLP 2021 paper "Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Reading Comprehension"
EricLee8/BiDeN
The official code of our paper at EMNLP 2022: Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling
huangmozhi9527/GMMFormer
[AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
HuiGuanLab/DL-DKD
Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval
duyali2000/MQMC
This repo has the PyTorch implementation and datasets of our WSDM 2023 paper: “Multi-queue Momentum Contrast for Microvideo-Product Retrieval”.
EricLee8/MPD_EMVI
Official implementation of our paper at ACL 2023: Pre-training Multi-party Dialogue Models with Latent Discourse Inference
EricLee8/SPACE
The official codes for our paper at COLING 2022: Semantic-Preserving Adversarial Code Comprehension
sangminwoo/AvisC
Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models"
huangmozhi9527/GMMFormer_v2
GMMFormer v2: An Uncertainty-aware Framework for Partially Relevant Video Retrieval