Designing supervised learning algorithms that can learn from data sets with noisy labels is a problem of great practical importance. This Blog I will give you a brief introduciton and worldview of this research area.
Deep learning has several principle problems:
- deep learning requires a lot of data training
- do not have enough ability to migrate
- open inference problem
- principle is not transparent deep learning heavily relies on high quality annotation data, resulting in high time, labor cost; so, how to achieve semi-supervised, unsupervised learning is a very important problem.
The Learning with noisy labels situation is as follows:
- In the initial phase, it has a certain amount of data of unknown annotation quality. There is a certain annotation data, which can be obtained through the search engine, the public dataset.
- The annotation data is of low quality — with high or low annotation errors
- it requires continuous manual input to constantly improve the quality of annotation. The form of human annotation may be with paid crowdsourcing, or with user feedback.
Before I introduce the research survey to you, I wrote a brief overview based on my view.
All of approaches in this area is to solve one problem: how to classify noisy data and clean data. And I classify all the approaches as two part as the following figure: one is to try to sturcture the noise distribution in dataset, the other is to build robust algorithm and model no matter what noise distribution.
And Then I will give some important surveys, not a long list:
- Label Noise Types and Their Effects on Deep Learning [pdf] [code]
- Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey [pdf]
- Learning from Noisy Labels with Deep Neural Networks [pdf]
- Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels (ICML 2020)--MentorMix [Code] [Video] [Blog]
- MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels (ICML 2018) [Code] [Video] [Blog]
- DivideMix: Learning with noisy labels as semi-supervised learning (ICLR 2020) [Code] [Blog]
- Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels (NIPS 2018) [Code]
- Symmetric Cross Entropy for Robust Learning with Noisy Labels (ICCV 2019) [Code]
- Early-Learning Regularization Prevents Memorization of Noisy Labels (NeurIPS 2020) [Code]
- SELFIE: Refurbishing unclean samples for robust deep learning (ICML 2019) [Code]