muzairkhattak's Stars
unslothai/unsloth
Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
anthropics/courses
Anthropic's educational courses
JingyunLiang/SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
bowang-lab/MedSAM
Segment Anything in Medical Images
epfml/ML_course
EPFL Machine Learning Course, Fall 2024
StanfordVL/taskonomy
Taskonomy: Disentangling Task Transfer Learning [Best Paper, CVPR2018]
TencentARC/Open-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
open-thought/system-2-research
System 2 Reasoning Link Collection
EPFL-VILAB/MultiMAE
MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022
NVlabs/EAGLE
EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
cientgu/VQ-Diffusion
mlfoundations/task_vectors
Editing Models with Task Arithmetic
EvolvingLMMs-Lab/LongVA
Long Context Transfer from Language to Vision
haritheja-e/robot-utility-models
Robot Utility Models are trained on a diverse set of environments and objects, and then can be deployed in novel environments with novel objects without any further data or training.
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
snap-research/weights2weights
Official Implementation of weights2weights
nv-dvl/segment-anything-lidar
[ECCV 2024] Better Call SAL: Towards Learning to Segment Anything in Lidar
zeyofu/BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]
liuzhuang13/bias
chs20/RobustVLM
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
vinid/safety-tuned-llamas
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
UCSC-VLAA/vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
ys-zong/VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
ExplainableML/fomo_in_flux
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
zycheiheihei/Transferable-Visual-Prompting
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompting for Multimodal Large Language Models" has been accepted in CVPR2024.
umer-sheikh/bird-whisperer
[InterSpeech 2024] Official code repository of paper titled "Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Classification" accepted in InterSpeech 2024 conference.
koushiksrivats/robust-concept-erasing
Official implementation of the paper "STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models"
renytek13/Soft-Prompt-Generation
[ECCV 2024] Soft Prompt Generation for Domain Generalization
akhtarvision/weather-regional
mbzuai-oryx/BiMediX2
Bio-Medical EXpert LMM with English and Arabic Language Capabilities