/tf-protoNN

Tensorflow ProtoNN for Multi-label learning (supports both single/multi-gpu usage)

Primary LanguagePythonMIT LicenseMIT

tf-protoNN


This repository contains the code for ProtoNN (a KNN based algorithm) implemented in Tensorflow for large-scale multi-label learning. This repository also has a script to run the training on multiple GPUs.

Note: some modifications have been made to improve run-time and performance on large-scale datasets. For more details about ProtoNN, please refer to ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices. If you are seeking to reproduce the results in the original paper, please use the official code provided by the authors.

Extreme multi-label (XML) algorithms

Unlike multi-class or binary classification, extreme multi-label (XML) algorithms tag data points with a subset of labels (rather than just a single label) from an extremely large label-set. XML problems usually deal with a large number of labels (103 - 106 labels) and a large number of dimensions and training points.

For datasets, check: XML-repository

Required packages

  1. Tensorflow
  2. FAISS
  3. Numpy
  4. Scipy
  5. Easydict

Usage

Check the ipython notebook to run the code on Eurlex-4k dataset. To change the parameters, modify the config file.

To run on a new dataset:

  1. Create a new folder with the directory name. Place two separate files train_data.mat and test_data.mat in that directory. Note that each of these files must have two variables: X with shape: (num instances, num features) and Y with shape (num instances, num labels)

  2. Create a config file in cfgs folder with the required parameters.

  3. For single GPU: Modify eurlex_train.py -> train.py (import the correct config file). For training on multiple GPUs modify eurlex_multigpu_train.py -> train.py and run python train.py