TensorFlow Ranking
TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform. It contains the following components:
- Commonly used loss functions including pointwise, pairwise, and listwise losses.
- Commonly used ranking metrics like Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG).
- Multi-item (also known as groupwise) scoring functions.
- LambdaLoss implementation for direct ranking metric optimization.
- Unbiased Learning-to-Rank from biased feedback data.
We envision that this library will provide a convenient open platform for hosting and advancing state-of-the-art ranking models based on deep learning techniques, and thus facilitate both academic research as well as industrial applications.
A quick demo for a ranker on dummy dataset (no setup required):
For more details on this code and data, look at the section on Example Code.
Linux Installation
Stable Builds
To install the latest version from PyPI, run the following:
# Installing with the `--upgrade` flag ensures you'll get the latest version.
pip install --user --upgrade tensorflow_ranking
To force a Python 3-specific install, replace pip
with pip3
in the above
commands. For additional installation help, guidance installing prerequisites,
and (optionally) setting up virtual environments, see the TensorFlow
installation guide.
Note: Since TensorFlow is not included as a dependency of the TensorFlow
Ranking package (in setup.py
), you must explicitly install the TensorFlow
package (tensorflow
or tensorflow-gpu
). This allows us to maintain one
package instead of separate packages for CPU and GPU-enabled TensorFlow.
Installing from Source
-
To build TensorFlow Ranking locally, you will need to install:
-
Bazel, an open source build tool.
$ sudo apt-get update && sudo apt-get install bazel
-
Pip, a Python package manager.
$ sudo apt-get install python-pip
-
VirtualEnv, a tool to create isolated Python environments.
$ pip install --user virtualenv
-
-
Clone the TensorFlow Ranking repository.
$ git clone https://github.com/tensorflow/ranking.git
-
Build TensorFlow Ranking wheel file and store them in
/tmp/ranking_pip
folder.$ cd ranking # The folder which was cloned in Step 2. $ bazel build //tensorflow_ranking/tools/pip_package:build_pip_package $ bazel-bin/tensorflow_ranking/tools/pip_package/build_pip_package /tmp/ranking_pip
-
Install the wheel package using pip. Test in virtualenv, to avoid clash with any system dependencies.
$ ~/.local/bin/virtualenv -p python3 /tmp/tfr $ source /tmp/tfr/bin/activate (tfr) $ pip install tensorflow # or tensorflow-gpu, if GPU support is needed. (tfr) $ pip install /tmp/ranking_pip/tensorflow_ranking*.whl
-
Run all TensorFlow Ranking tests.
(tfr) $ bazel test //tensorflow_ranking/...
-
Invoke TensorFlow Ranking package in python (within virtualenv).
(tfr) $ python -c "import tensorflow_ranking"
Example Code
The repository has a running script over a dummy data set in the LIBSVM format.
Running Script
-
Set up the data and directory.
OUTPUT_DIR=/tmp/output && \ TRAIN=tensorflow_ranking/examples/data/train.txt && \ VALI=tensorflow_ranking/examples/data/vali.txt && \ TEST=tensorflow_ranking/examples/data/test.txt
-
Build and run.
rm -rf $OUTPUT_DIR && \ bazel build -c opt \ tensorflow_ranking/examples/tf_ranking_libsvm_py_binary && \ ./bazel-bin/tensorflow_ranking/examples/tf_ranking_libsvm_py_binary \ --train_path=$TRAIN \ --vali_path=$VALI \ --test_path=$TEST \ --output_dir=$OUTPUT_DIR \ --num_features=136 \ --num_train_steps=100
TensorBoard
The training results such as loss and metrics can be visualized using Tensorboard.
-
(Optional) If you are working on remote server, set up port forwarding with this command.
$ ssh <remote-server> -L 8888:127.0.0.1:8888
-
Install Tensorboard and invoke it with the following commands.
(tfr) $ pip install tensorboard (tfr) $ tensorboard --logdir $OUTPUT_DIR
Jupyter Notebook
An example jupyter notebook using
the LIBSVM format
is available in tensorflow_ranking/examples/tf_ranking_libsvm.ipynb
.
-
To run this notebook, first follow the steps in installation to set up
virtualenv
environment with tensorflow_ranking package installed. -
Install jupyter within virtualenv.
(tfr) $ pip install jupyter
-
Start a jupyter notebook instance on remote server.
(tfr) $ jupyter notebook tensorflow_ranking/examples/tf_ranking_libsvm.ipynb \ --NotebookApp.allow_origin='https://colab.research.google.com' \ --port=8888
-
(Optional) If you are working on remote server, set up port forwarding with this command.
$ ssh <remote-server> -L 8888:127.0.0.1:8888
-
Running the notebook.
-
Start jupyter notebook on your local machine at http://localhost:8888/ and browse to the ipython notebook.
-
An alternative is to use colaboratory notebook via colab.research.google.com and open the notebook in the browser. Choose local runtime and link to port 8888.
-
References
-
Rama Kumar Pasumarthi, Xuanhui Wang, Cheng Li, Sebastian Bruch, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, Stephan Wolf. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank. CoRR abs/1812.00073 (2018)
-
Qingyao Ai, Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Marc Najork. Learning Groupwise Scoring Functions Using Deep Neural Networks. CoRR abs/1811.04415 (2018)
-
Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. Learning to Rank with Selection Bias in Personal Search. SIGIR 2016.
-
Xuanhui Wang, Cheng Li, Nadav Golbandi, Mike Bendersky, Marc Najork. The LambdaLoss Framework for Ranking Metric Optimization. CIKM 2018.
Citation
If you use TensorFlow Ranking in your research and would like to cite it, we suggest you use the following citation:
@misc{TensorflowRanking2018,
author = {Rama Kumar Pasumarthi and Xuanhui Wang and Cheng Li and Sebastian Bruch and Michael Bendersky and Marc Najork and Jan Pfeifer and Nadav Golbandi and Rohan Anil and Stephan Wolf},
title = {TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank},
year = {2018},
eprint = {arXiv:1812.00073},
}