A Boosting Algorithm for Positive-Unlabeled Learning

This is a reproducing code for AdaPU in the paper "A Boosting Algorithm for Positive-Unlabeled Learning".

utils.py has implementations of the risk estimator for non-negative PU (nnPU) learning [1].
train.py is an example code of running the algorithm.

The four used datasets are:

CIFAR-10 [2] preprocessed in such a way that artifacts form the P class and living things form the N class.
Epsilon [3] is a binary classification text dataset.
UNSW-NB15 [4] is a binary classiﬁcation dataset.
Breast Cancer [5] is a binary classification dataset.

Operation System:

Requirements：

Quick start

You can just run the python file: train.py, it will be executed once with the default setting, and the result will be printed and saved. You can also try different parameters before you execute the python file.

python3 src/train.py \
--dataset breastcancer \
--seed 5 \
--num_estimator 100 \
--beta 0.0001 \
--random 1

Reproduce

Dataset	Beta	Accuracy
CIFAR-10	0.1	86.21
Epsilon	0.2	73.05
UNSW-NB15	0.1	76.62
Breast Cancer	0.0001	96.49

Reference

[1] Ryuichi Kiryo, Gang Niu, Marthinus Christoffel du Plessis, and Masashi Sugiyama. "Positive-Unlabeled Learning with Non-Negative Risk Estimator." Advances in neural information processing systems. 2017.

[2] Krizhevsky, Alex, and Geoffrey Hinton. "Learning multiple layers of features from tiny images." (2009).

[3] Yuan, G. X., Ho, C. H., & Lin, C. J. (2012). An Improved GLMNET for L1-regularized Logistic Regression. Journal of Machine Learning Research, 13, 1999-2030.

[4] Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.