Survey on data-centric multimodal large language models
List of Sources
Textual-Pretraining Datasets:
MM-Pretraining Datasets:
Common Textual SFT Datasets:
Domain Specific Textual SFT Datasets:
Multimodal SFT Datasets:
- Doremi: Optimizing data mixtures speeds up language model pretraining - paper
- Data selection for language models via importance resampling - paper
- Glam: Efficient scaling of language models with mixture-of-experts - paper
- Videollm: Modeling video sequence with large language models - paper
- Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset - paper
- Moviechat: From dense token to sparse memory for long video understanding - paper
- Internvid: A large-scale video-text dataset for multimodal understanding and generation - paper
- Youku-mplug: A 10 million large-scale chinese video-language dataset for pre-training and benchmarks - paper
- MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training - paper
- From scarcity to efficiency: Improving clip training via visual-enriched captions - paper
- Valor: Vision-audio-language omni-perception pretraining model and dataset - paper
- AutoAD:Moviedescription in context - paper
- Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding - paper
- VideoChat: Chat-Centric Video Understanding - paper
- Mvbench: A comprehensive multi-modal video understanding benchmark - paper
- LLaMA-VID: An image is worth 2 tokens in large language models - paper
- Video-llava:Learningunitedvisualrepresentation by alignment before projection - paper
- Valley: Video assistant with large language model enhanced ability - paper
- Video-llama: An instruction-tuned audio-visual language model for video understanding - paper
- Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration - paper
- Audio-Visual LLM for Video Understanding - paper
- Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models - paper
- DataComp: In search of the next generation of multimodal datasets - paper
- Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters - paper
- CiT: Curation in Training for Effective Vision-Language Data - paper
- Sieve: Multimodal Dataset Pruning Using Image Captioning Models - paper
- Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning - paper
- Unnatural instructions: Tuning language models with (almost) no human labor - paper
- Active Learning for Convolutional Neural Networks: A Core-Set Approach - paper
- Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning - paper
- Similar: Submodular information measures based active learning in realistic scenarios - paper
- Practical coreset constructions for machine learning - paper
- Deep learning on a data diet: Finding important examples early in training - paper
- A new active labeling method for deep learning - paper
- Maybe only 0.5% data is needed: A preliminary exploration of low training data instruction tuning - paper
- DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set Selection - paper
- Beyond neural scaling laws: beating power law scaling via data pruning - paper
- Mods: Model-oriented data selection for instruction tuning. - paper
- DeBERTa: Decoding-enhanced BERT with Disentangled Attention - paper
- Alpagasus: Training a better alpaca with fewer data - paper
- Rethinking the Instruction Quality: LIFT is What You Need - paper
- What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning - paper
- InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models - paper
- SelectLLM: Can LLMs Select Important Instructions to Annotate? - paper
- Improved Baselines with Visual Instruction Tuning - paper
- NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks - paper
- LESS: Selecting Influential Data for Targeted Instruction Tuning - paper
- From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning - paper
- One shot learning as instruction data prospector for large language models - paper
- Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks - paper
- SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection - paper
- Training language models to follow instructions with human feedback - paper
- LLaMA-VID: An image is worth 2 tokens in large language models - paper
- Aligning large multimodal models with factually augmented rlhf - paper
- Dress: Instructing large vision-language models to align and interact with humans via natural language feedback - paper
- Gans trained by a two time-scale update rule converge to a local nash equilibrium - paper
- Assessing generative models via precision and recall - paper
- Unsupervised Quality Estimation for Neural Machine Translation - paper
- Mixture models for diverse machine translation: Tricks of the trade - paper
- The vendi score: A diversity evaluation metric for machine learning - paper
- Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning - paper
- Navigating text-to-image customization: From lycoris fine-tuning to model evaluation - paper
- TRUE: Re-evaluating factual consistency evaluation - paper
- Object hallucination in image captioning - paper
- Faithscore: Evaluating hallucinations in large vision-language models - paper
- Deep coral: Correlation alignment for deep domain adaptation - paper
- Transferability in deep learning: A survey - paper
- Mauve scores for generative models: Theory and practice - paper
- Translating Videos to Natural Language Using Deep Recurrent Neural Networks - paper