/Ada-PU

Code for paper: A Boosting Algorithm for Positive-Unlabeled Learning

Primary LanguagePythonMIT LicenseMIT

A Boosting Algorithm for Positive-Unlabeled Learning

This is a reproducing code for AdaPU in the paper "A Boosting Algorithm for Positive-Unlabeled Learning".

  • utils.py has implementations of the risk estimator for non-negative PU (nnPU) learning [1].
  • train.py is an example code of running the algorithm.

The four used datasets are:

  • CIFAR-10 [2] preprocessed in such a way that artifacts form the P class and living things form the N class.
  • Epsilon [3] is a binary classification text dataset.
  • UNSW-NB15 [4] is a binary classification dataset.
  • Breast Cancer [5] is a binary classification dataset.

Operation System:

macOS Badge Linux Badge Ubuntu Badge

Requirements:

Python Cython Scikit-learn NumPy tqdm pandas colorama

Quick start

You can just run the python file: train.py, it will be executed once with the default setting, and the result will be printed and saved. You can also try different parameters before you execute the python file.

python3 src/train.py \
--dataset breastcancer \
--seed 5 \
--num_estimator 100 \
--beta 0.0001 \
--random 1

Reproduce

Dataset Beta Accuracy
CIFAR-10 0.1 86.21
Epsilon 0.2 73.05
UNSW-NB15 0.1 76.62
Breast Cancer 0.0001 96.49

Reference

[1] Ryuichi Kiryo, Gang Niu, Marthinus Christoffel du Plessis, and Masashi Sugiyama. "Positive-Unlabeled Learning with Non-Negative Risk Estimator." Advances in neural information processing systems. 2017.

[2] Krizhevsky, Alex, and Geoffrey Hinton. "Learning multiple layers of features from tiny images." (2009).

[3] Yuan, G. X., Ho, C. H., & Lin, C. J. (2012). An Improved GLMNET for L1-regularized Logistic Regression. Journal of Machine Learning Research, 13, 1999-2030.

[4] Moustafa, Nour, and Jill Slay. "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)." Military Communications and Information Systems Conference (MilCIS), 2015. IEEE, 2015.

[5] W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.