Awesome-Dataset-Distillation

Awesome Contrib PaperNum Stars Forks

A curated list of awesome papers on dataset distillation and related applications, inspired by awesome-computer-vision.

Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the 2018 paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps.

In recent years (2019-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.

This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.

How to submit a pull request?

  • ๐ŸŒ Project Page
  • :octocat: Code
  • ๐Ÿ“– bibtex

Citing Awesome-Dataset-Distillation

If you find this project useful for your research, please use the following BibTeX entry.

@misc{li2022awesome,
  author={Li, Guang and Zhao, Bo and Wang, Tongzhou},
  title={Awesome-Dataset-Distillation},
  howpublished={\url{https://github.com/Guang000/Awesome-Dataset-Distillation}},
  year={2022}
}

Contents

Media Coverage
Acknowledgments

Main

Early Work

Gradient/Trajectory Matching Surrogate Objective

Distribution/Feature Matching Surrogate Objective

Better Optimization

Distilled Dataset Parametrization

Generative Prior

Label Distillation

Dataset Quantization

Multimodal Distillation

Self-Supervised Distillation

Benchmark

Survey

Applications

Continual Learning

Privacy

Medical

Federated Learning

Graph Neural Network

Neural Architecture Search

Fashion, Art, and Design

Knowledge Distillation

Recommender Systems

Blackbox Optimization

Trustworthy

Retrieval

Text

Tabular

Media Coverage

Acknowledgments

We want to thank Nikolaos Tsilivis, Wei Jin, Yongchao Zhou, Noveen Sachdeva, Can Chen, Guangxiang Zhao, Shiye Lei, Xinchao Wang, Dmitry Medvedev, Seungjae Shin, Jiawei Du, Yidi Jiang, Xindi Wu, Guangyi Liu, Yilun Liu, Kai Wang, Yue Xu and Anjia Cao for their valuable suggestions and contributions.