Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning
Latest update date: February 2, 2024 UTC.
Labels: publisher year
📄PDF, 🔗Codes, 💡Report
Self-Instruct: Aligning Language Models with Self-Generated Instructions. ACL 2023
📄PDF, 🔗Data
Alpaca: A Strong, Replicable Instruction-Following Model. Report 2023
💡Blog, 🔗Data
WizardLM: Empowering Large Language Models to Follow Complex Instructions. arXiv 2023
📄PDF, 🔗Data
LIMA: Less Is More for Alignment. arXiv 2023
📄PDF, 🔗Data
Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM. Report 2023
💡Blog, 🔗Data
Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLR 2022
📄PDF, 🔗Data
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models. arXiv 2023
📄PDF
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4. arXiv 2023
📄PDF, 🔗Codes
Dataset Quantization. ICCV 2023
📄PDF, 🔗Codes
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning. arXiv 2023
📄PDF, 🔗Codes
Self-Alignment with Instruction Backtranslation. arXiv 2023
📄PDF
One Shot Learning as Instruction Data Prospector for Large Language Models. arXiv 2023
📄PDF, 🔗Codes
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning. arXiv 2023
📄PDF, 🔗Codes
TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design. ICLR 2024
📄PDF
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks. EMNLP 2023
📄PDF, 🔗Codes
AlpaGasus: Training A Better Alpaca with Fewer Data. arXiv 2023
📄PDF, 💡Blog
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models. arXiv 2023
📄PDF, 🔗Codes
Rethinking the Instruction Quality: {LIFT} is What You Need. arXiv 2023
📄PDF
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning. arXiv 2023
📄PDF, 🔗Codes
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment. arXiv 2023
📄PDF
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation. arXiv 2023
📄PDF
MoDS: Model-oriented Data Selection for Instruction Tuning. arXiv 2023
📄PDF, 🔗Codes
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning. arXiv 2023
📄PDF, 🔗Codes