Muzammal-Naseer's Stars
xai-org/grok-1
Grok open release
apple/ml-ferret
mbzuai-oryx/Video-ChatGPT
[ACL 2024 π₯] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
mbzuai-oryx/MobiLlama
MobiLlama : Small Language Model tailored for edge devices
awaisrauf/Awesome-CV-Foundational-Models
mbzuai-oryx/GeoChat
[CVPR 2024 π₯] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
Haiyang-W/GiT
[ECCV2024 Oralπ₯] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
muzairkhattak/PromptSRC
[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".
TalalWasim/Vita-CLIP
Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]
fahadshamshad/Clip2Protect
[CVPR 2023] Official repository of paper titled "CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search".
jameelhassan/PromptAlign
[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
TalalWasim/Video-FocalNets
Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]
muzairkhattak/ProText
[CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".
techmn/satmae_pp
Official repository for "Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery" (CVPR 2024)
hananshafi/llmblueprint
[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"
koushiksrivats/FLIP
Official implementation of the paper "FLIP: Cross-domain Face Anti-spoofing with Language Guidance". (ICCV 2023)
uncbiag/SegNext
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts (CVPR 2024)
rohit901/cooperative-foundational-models
Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"
asif-hanif/vafa
[MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation" accepted in MICCAI 2023 conference.
OmkarThawakar/composed-video-retrieval
Composed Video Retrieval
mbzuai-oryx/CVRR-Evaluation-Suite
Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".
sheng-eatamath/PromptCAL
Official Implementation of paper: PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery (CVPR'23)
kahnchana/clippy
Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)
Muhammad-Huzaifaa/ObjectCompose
[ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes πππ
ShahinaKK/LG_SDG
Language Grounded Single Source Domain Generalization in Medical Image Segmentation [ISBI2024]
sheng-eatamath/S3A
repo for paper titled: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment (AAAI'24 Oral)
ShahinaKK/LWI-VMS
Learnable Weight Initialization for Volumetric Medical Image Segmentation [Elsevier AIM2024]
Hasindri/HLSS
[MICCAI 2024 π₯] HLSS, the first study to explore hierarchical information inherent in histopathology images and their language descriptions for strong multi-modal representation learning
Muzammal-Naseer/DCViT-AT
Official repository for "Boosting Adversarial Transferability using Dynamic Cues " (ICLR 2023)
hananshafi/MedContext
[MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"