gersys's Stars
frostinassiky/denoiseloc
The official implementation of DenoiseLoc: Boundary Denoising for Video Activity Localization, ICLR 2024
houzhijian/CONE
[2023 ACL] CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
waybarrios/guidance-based-video-grounding
[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"
liyongqi67/MINDER
RUC-NLPIR/GenIR-Survey
This is the repository for the GenIR survey.
Pter61/context-i2w
Context-I2W: Mapping Images to Context-dependent words for Accurate Zero-Shot Composed Image Retrieval [AAAI 2024 Oral]
yzy-bupt/LDRE
[SIGIR'2024 Best Paper Honorable Mention] Official repository for "LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval"
ninatu/howtocaption
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
DavidHuji/CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
whwu95/Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Seonghoon-Yu/Pseudo-RIS
[ECCV 2024] Official code for "Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation"
ExplainableML/EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
vl2g/CSTBIR
haokunwen/Awesome-Composed-Image-Retrieval
Collection of Composed Image Retrieval (CIR) papers.
LLaVA-VL/LLaVA-NeXT
jinhyunj/EaTR
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
meta-llama/llama-models
Utilities intended for use with Llama models.
navervision/CompoDiff
Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)
ml-jku/cloob
Yui010206/SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
miccunifi/CIRCO
[ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset
QinYang79/RDE
Noisy-Correspondence Learning for Text-to-Image Person Re-identification (CVPR 2024 Pytorch Code)
ExplainableML/Vision_by_Language
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
youngkyunJang/VDG
Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024
miccunifi/SEARLE
[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion
google-research/composed_image_retrieval