Learning With noisy labels

Designing supervised learning algorithms that can learn from data sets with noisy labels is a problem of great practical importance. This Blog I will give you a brief introduciton and worldview of this research area.

1. Context

Deep learning has several principle problems:

deep learning requires a lot of data training
do not have enough ability to migrate
open inference problem
principle is not transparent deep learning heavily relies on high quality annotation data, resulting in high time, labor cost; so, how to achieve semi-supervised, unsupervised learning is a very important problem.

The Learning with noisy labels situation is as follows:

In the initial phase, it has a certain amount of data of unknown annotation quality. There is a certain annotation data, which can be obtained through the search engine, the public dataset.
The annotation data is of low quality — with high or low annotation errors
it requires continuous manual input to constantly improve the quality of annotation. The form of human annotation may be with paid crowdsourcing, or with user feedback.

2. Survey

Before I introduce the research survey to you, I wrote a brief overview based on my view.

All of approaches in this area is to solve one problem: how to classify noisy data and clean data. And I classify all the approaches as two part as the following figure: one is to try to sturcture the noise distribution in dataset, the other is to build robust algorithm and model no matter what noise distribution.

And Then I will give some important surveys, not a long list:

Label Noise Types and Their Effects on Deep Learning [pdf] [code]
Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey [pdf]
Learning from Noisy Labels with Deep Neural Networks [pdf]

3. Paper List

EM Algorithm

Training deep neural-networks using a noise adaptation layer (ICLR 2017) [Code_1] [Code_2] [Blog]

Confident Learning

Confident Learning: Estimating Uncertainty in Dataset Labels (ICML 2019) [Code] [Blog]

Sample Weighting

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels (ICML 2020)--MentorMix [Code] [Video] [Blog]

Curriculum Learning

MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels (ICML 2018) [Code] [Video] [Blog]
DivideMix: Learning with noisy labels as semi-supervised learning (ICLR 2020) [Code] [Blog]

Co-teaching

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels (NIPS 2018) [Code]

Robust loss function

Symmetric Cross Entropy for Robust Learning with Noisy Labels (ICCV 2019) [Code]

Regularization

Early-Learning Regularization Prevents Memorization of Noisy Labels (NeurIPS 2020) [Code]

Label Cleaning

SELFIE: Refurbishing unclean samples for robust deep learning (ICML 2019) [Code]

qywa/Awesome-Learning-With-Noisy-Labels