/Awesome-Coreset-Selection

Awesome coreset/core-set/subset/sample selection works.

Awesome-Coreset-Selection

Survey + Library

  • DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning (arXiv 2022) PDF
  • Introduction to Core-sets: an Updated Survey (arXiv 2020) PDF
  • Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms (KI-Künstliche Intelligenz 2018) PDF
  • Coresets and sketches (arXiv 2016) PDF

Papers

Efficient Model Training (fast & scalable)

2021

  • Face-NMS: A Core-set Selection Approach for Efficient Face Recognition (arXiv 2021) PDF
  • Learning Fast Sample Re-weighting Without Reward Data (arXiv 2021) PDF Code
  • Submodular Mutual Information for Targeted Data Subset Selection(arXiv 2021) PDF
  • PRISM: A Unified Framework of Parameterized Submodular Information Measures for Targeted Data Subset Selection and Summarization(arXiv 2021) PDF
  • Dataset Condensation with Differentiable Siamese Augmentation(ICML 2021) PDF
  • Coresets for Classification -- Simplified and Strengthened(arXiv 2021) PDF
  • GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training(ICML 2021) PDF Code
  • GLISTER: Generalization Based Data Subset Selection for Efficient and Robust Learning(AAAI 2021) PDF Code
  • SVP-CF: Selection via Proxy for Collaborative Filtering Data(arXiv 2021) PDF
  • Dataset Condensation with Gradient Matching(ICLR 2021) PDF
  • Deep Learning on a Data Diet: Finding Important Examples Early in Training(arXiv 2021) PDF
  • A Novel Sequential Coreset Method for Gradient Descent Algorithms(ICML 2021) PDF
  • Stochastic Subset Selection for Efficient Training and Inference of Neural Networks(ICLR 2021) PDF

2020

  • Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms(arXiv 2020) PDF Code
  • Coresets for Data-efficient Training of Machine Learning Models(ICML 2020) PDF Code
  • Selection via Proxy: Efficient Data Selection for Deep Learning (ICLR 2020) PDF Code

2019

  • Teaching a black-box learner(ICML 2019) PDF
  • An Empirical Study of Example Forgetting during Deep Neural Network Learning(ICLR 2019) PDF Code
  • Learning and Data Selection in Big Datasets(ICML 2019) PDF
  • Preventing Adversarial Use of Datasets through Fair Core-Set Construction(arXiv 2019) PDF

2017

  • Subset Selection and Summarization in Sequential Data (NeurIPS 2017) PDF

2016

  • New Frameworks for Offline and Streaming Coreset Constructions (arXiv 2016) PDF

2014

  • Coresets for k-Segmentation of Streaming Data (NeurIPS 2014) PDF

Continual Learning

2021

  • Online Coreset Selection for Rehearsal-based Continual Learning(arXiv 2021) PDF

2020

  • Optimal Continual Learning has Perfect Memory and is NP-HARD(ICML 2020) PDF
  • Coresets via Bilevel Optimization for Continual Learning and Streaming(NeurIPS 2020) PDF Code

2019

  • Gradient based sample selection for online continual learning(NeurIPS 2019) PDF Code

Active Learning

2022

  • Active Learning is a Strong Baseline for Data Subset SelectionDownload PDF (NeurIPS 2022 Workshop) PDF Code

2021

  • Active Learning by Acquiring Contrastive Examples (arXiv 2021) PDF Code
  • SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios(arXiv 2021) PDF

2020

  • Contextual Diversity for Active Learning(ECCV 2020) PDF Code

2019

  • Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision(WACV 2019) PDF
  • Bayesian Batch Active Learning as Sparse Subset Approximation(NeurIPS 2019) PDF Code

2018

  • Active Learning for Convolutional Neural Networks: A Core-Set Approach (ICLR 2018) PDF Code
  • Adversarial Active Learning for Deep Networks: a Margin Based Approach(arXiv 2018) PDF

2017

  • Non-Uniform Subset Selection for Active Learning in Structured Data(CVPR 2017) PDF

2015

  • Submodularity in Data Subset Selection and Active Learning(ICML 2015) PDF

Neural Architecture Search

2021

  • Core-set Sampling for Efficient Neural Architecture Search (arXiv 2021) PDF

Clustering & Distribution Approximation

2021

  • Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space (arXiv 2021) PDF

2020

  • Online Coresets for Clustering with Bregman Divergences(arXiv 2020) PDF
  • Coresets for Clustering in Graphs of Bounded Treewidth(ICML 2020) PDF

2019

  • Coresets for Clustering with Fairness Constraints (NeurlPS 2019) PDF Code
  • Coresets for Ordered Weighted Clustering (ICML 2019) PDF 2018
  • Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension(Annual IEEE Symposium on Foundations of Computer Science 2018) PDF

2015

  • Coresets for Nonparametric Estimation - the Case of DP-Means(ICML 2015) PDF

2014

  • Distributed Balanced Clustering via Mapping Coresets (NeurlPS 2014) PDF

2012

  • Super-Samples from Kernel Herding(arXiv 2012) PDF Code

2011

  • Scalable Training of Mixture Models via Coresets(NeurIPS 2011) PDF

Semi-supervised Learning

2021

  • Semi-supervised Batch Active Learning via Bilevel Optimization(ICASSP 2021) PDF Code
  • RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning(NeurIPS 2021) PDF

Contrastive Learning

2021

  • Extending Contrastive Learning to Unsupervised Coreset Selection(arXiv 2021) PDF

2020

  • Are all negatives created equal in contrastive instance discrimination? (arXiv 2020) PDF

Robust Learning

2023

  • Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy (NeurIPS 2023) PDF

2021

  • Active label cleaning: Improving dataset quality under resource constraints (arXiv 2021) PDF
  • Just Train Twice: Improving Group Robustness without Training Group Information(ICML 2021) PDF Code

2020

  • Coresets for Robust Training of Deep Neural Networks against Noisy Labels(NeurIPS 2020) PDF Code

GAN

2020

  • Small-GAN: Speeding up GAN Training using Core-Sets(ICML 2020) PDF

Bayesian Inference

2021

  • Bayesian Coresets: Revisiting the Nonconvex Optimization Perspective (AISTATS 2021) PDF Code

2020

  • Bayesian Pseudocoresets (NeurIPS 2020) PDF Code

2019

  • Sparse Variational Inference: Bayesian Coresets from Scratch (NeurIPS 2019) PDF Code

2018

  • Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent(ICML 2018) PDF Code

Regression

2021

  • Training Data Subset Selection for Regression with Controlled Generalization Error(ICML 2021) PDF Code

2020

  • Coresets for Near-Convex Functions(NeurIPS 2020) PDF
  • On Coresets for Regularized Regression(ICML 2020) PDF Code
  • Coresets for Regressions with Panel Data(NeurIPS 2020) PDF Code

2019

  • Fast Parallel Algorithms for Statistical Subset Selection Problems(NeurIPS 2019) PDF

2018

  • On Coresets for Logistic Regression(NeurIPS 2018) PDF

2016

  • Coresets for Scalable Bayesian Logistic Regression (NeurlPS 2016) PDF

Workshops

  • SubSetML: Subset Selection in Machine Learning: From Theory to Practice (Workshop @ ICML 2021) Site