tf-protoNN

This repository contains the code for ProtoNN (a KNN based algorithm) implemented in Tensorflow for large-scale multi-label learning. This repository also has a script to run the training on multiple GPUs.

Note: some modifications have been made to improve run-time and performance on large-scale datasets. For more details about ProtoNN, please refer to ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices. If you are seeking to reproduce the results in the original paper, please use the official code provided by the authors.

Extreme multi-label (XML) algorithms

Unlike multi-class or binary classification, extreme multi-label (XML) algorithms tag data points with a subset of labels (rather than just a single label) from an extremely large label-set. XML problems usually deal with a large number of labels (10³ - 10⁶ labels) and a large number of dimensions and training points.

For datasets, check: XML-repository

Required packages

Tensorflow
FAISS
Numpy
Scipy
Easydict

Usage

Check the ipython notebook to run the code on Eurlex-4k dataset. To change the parameters, modify the config file.

To run on a new dataset:

Create a new folder with the directory name. Place two separate files train_data.mat and test_data.mat in that directory. Note that each of these files must have two variables: X with shape: (num instances, num features) and Y with shape (num instances, num labels)
Create a config file in cfgs folder with the required parameters.
For single GPU: Modify eurlex_train.py -> train.py (import the correct config file). For training on multiple GPUs modify eurlex_multigpu_train.py -> train.py and run python train.py

saisrivatsan/tf-protoNN

tf-protoNN

Extreme multi-label (XML) algorithms

Required packages

Usage