ranking

Learning to Rank in TensorFlow

Github星跟踪图

TensorFlow Ranking

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the
TensorFlow platform. It contains the following components:

We envision that this library will provide a convenient open platform for
hosting and advancing state-of-the-art ranking models based on deep learning
techniques, and thus facilitate both academic research and industrial
applications.

Tutorial Slides

TF-Ranking was presented at premier conferences in Information Retrieval,
SIGIR 2019 and
ICTIR 2019! The slides are available
here.

Demos

We provide a demo, with no installation required, to get started on using
TF-Ranking. This demo runs on a
colaboratory notebook, an
interactive Python environment. Using sparse features and embeddings in
TF-Ranking
Run in Google Colab.
This demo demonstrates how to:

*   Use sparse/embedding features
*   Process data in TFRecord format
*   Tensorboard integration in colab notebook, for Estimator API

Also see Running Scripts for executable scripts.

Linux Installation

Stable Builds

To install the latest version from
PyPI, run the following:

# Installing with the `--upgrade` flag ensures you'll get the latest version.
pip install --user --upgrade tensorflow_ranking

To force a Python 3-specific install, replace pip with pip3 in the above
commands. For additional installation help, guidance installing prerequisites,
and (optionally) setting up virtual environments, see the
TensorFlow installation guide.

Note: Since TensorFlow is now included as a dependency of the TensorFlow Ranking
package (in setup.py). If you wish to use different versions of TensorFlow
(e.g., tensorflow-gpu), you may need to uninstall the existing verison and
then install your desired version:

$ pip uninstall tensorflow
$ pip install tensorflow-gpu

Installing from Source

  1. To build TensorFlow Ranking locally, you will need to install:

    • Bazel, an open
      source build tool.

      $ sudo apt-get update && sudo apt-get install bazel
      
    • Pip, a Python package manager.

      $ sudo apt-get install python-pip
      
    • VirtualEnv, a tool
      to create isolated Python environments.

      $ pip install --user virtualenv
      
  2. Clone the TensorFlow Ranking repository.

    $ git clone https://github.com/tensorflow/ranking.git
    
  3. Build TensorFlow Ranking wheel file and store them in /tmp/ranking_pip
    folder.

    $ cd ranking  # The folder which was cloned in Step 2.
    $ bazel build //tensorflow_ranking/tools/pip_package:build_pip_package
    $ bazel-bin/tensorflow_ranking/tools/pip_package/build_pip_package /tmp/ranking_pip
    
  4. Install the wheel package using pip. Test in virtualenv, to avoid clash with
    any system dependencies.

    $ ~/.local/bin/virtualenv -p python3 /tmp/tfr
    $ source /tmp/tfr/bin/activate
    (tfr) $ pip install /tmp/ranking_pip/tensorflow_ranking*.whl
    

    In some cases, you may want to install a specific version of tensorflow,
    e.g., tensorflow-gpu or tensorflow==2.0.0. To do so you can either

    (tfr) $ pip uninstall tensorflow
    (tfr) $ pip install tensorflow==2.0.0
    

    or

    (tfr) $ pip uninstall tensorflow
    (tfr) $ pip install tensorflow-gpu
    
  5. Run all TensorFlow Ranking tests.

    (tfr) $ bazel test //tensorflow_ranking/...
    
  6. Invoke TensorFlow Ranking package in python (within virtualenv).

    (tfr) $ python -c "import tensorflow_ranking"
    

Running Scripts

For ease of experimentation, we also provide
a TFRecord example
and
a LIBSVM example
in the form of executable scripts. This is particularly useful for
hyperparameter tuning, where the hyperparameters are supplied as flags to the
script.

TFRecord Example

  1. Set up the data and directory.

    MODEL_DIR=/tmp/tf_record_model && \
    TRAIN=tensorflow_ranking/examples/data/train_elwc.tfrecord && \
    EVAL=tensorflow_ranking/examples/data/eval_elwc.tfrecord && \
    VOCAB=tensorflow_ranking/examples/data/vocab.txt
    
  2. Build and run.

    rm -rf $MODEL_DIR && \
    bazel build -c opt \
    tensorflow_ranking/examples/tf_ranking_tfrecord_py_binary && \
    ./bazel-bin/tensorflow_ranking/examples/tf_ranking_tfrecord_py_binary \
    --train_path=$TRAIN \
    --eval_path=$EVAL \
    --vocab_path=$VOCAB \
    --model_dir=$MODEL_DIR \
    --data_format=example_list_with_context
    

LIBSVM Example

  1. Set up the data and directory.

    OUTPUT_DIR=/tmp/libsvm && \
    TRAIN=tensorflow_ranking/examples/data/train.txt && \
    VALI=tensorflow_ranking/examples/data/vali.txt && \
    TEST=tensorflow_ranking/examples/data/test.txt
    
  2. Build and run.

    rm -rf $OUTPUT_DIR && \
    bazel build -c opt \
    tensorflow_ranking/examples/tf_ranking_libsvm_py_binary && \
    ./bazel-bin/tensorflow_ranking/examples/tf_ranking_libsvm_py_binary \
    --train_path=$TRAIN \
    --vali_path=$VALI \
    --test_path=$TEST \
    --output_dir=$OUTPUT_DIR \
    --num_features=136 \
    --num_train_steps=100
    

TensorBoard

The training results such as loss and metrics can be visualized using
Tensorboard.

  1. (Optional) If you are working on remote server, set up port forwarding with
    this command.

    $ ssh <remote-server> -L 8888:127.0.0.1:8888
    
  2. Install Tensorboard and invoke it with the following commands.

    (tfr) $ pip install tensorboard
    (tfr) $ tensorboard --logdir $OUTPUT_DIR
    

Jupyter Notebook

An example jupyter notebook is available in
third_party/tensorflow_ranking/examples/handling_sparse_features.ipynb.

  1. To run this notebook, first follow the steps in installation to set up
    virtualenv environment with tensorflow_ranking package installed.

  2. Install jupyter within virtualenv.

    (tfr) $ pip install jupyter
    
  3. Start a jupyter notebook instance on remote server.

    (tfr) $ jupyter notebook third_party/tensorflow_ranking/examples/handling_sparse_features.ipynb \
            --NotebookApp.allow_origin='https://colab.research.google.com' \
            --port=8888
    
  4. (Optional) If you are working on remote server, set up port forwarding with
    this command.

    $ ssh <remote-server> -L 8888:127.0.0.1:8888
    
  5. Running the notebook.

    • Start jupyter notebook on your local machine at
      http://localhost:8888/ and browse to the
      ipython notebook.

    • An alternative is to use colaboratory notebook via
      colab.research.google.com and open
      the notebook in the browser. Choose local runtime and link to port 8888.

References

  • Rama Kumar Pasumarthi, Sebastian Bruch, Xuanhui Wang, Cheng Li, Michael
    Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, Stephan
    Wolf. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank.
    KDD 2019.

  • Qingyao Ai, Xuanhui Wang, Sebastian Bruch, Nadav Golbandi, Michael
    Bendersky, Marc Najork. Learning Groupwise Scoring Functions Using Deep
    Neural Networks.
    ICTIR 2019

  • Xuanhui Wang, Michael Bendersky, Donald Metzler, and Marc Najork. Learning
    to Rank with Selection Bias in Personal Search.

    SIGIR 2016.

  • Xuanhui Wang, Cheng Li, Nadav Golbandi, Mike Bendersky, Marc Najork. The
    LambdaLoss Framework for Ranking Metric Optimization
    .
    CIKM 2018.

Citation

If you use TensorFlow Ranking in your research and would like to cite it, we
suggest you use the following citation:

@inproceedings{TensorflowRankingKDD2019,
   author = {Rama Kumar Pasumarthi and Sebastian Bruch and Xuanhui Wang and Cheng Li and Michael Bendersky and Marc Najork and Jan Pfeifer and Nadav Golbandi and Rohan Anil and Stephan Wolf},
   title = {TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank},
   booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
   year = {2019},
   pages = {2970--2978},
   location = {Anchorage, AK}
}

主要指标

概览
名称与所有者tensorflow/ranking
主编程语言Python
编程语言Python (语言数: 4)
平台
许可证Apache License 2.0
所有者活动
创建于2018-12-03 20:48:57
推送于2024-03-18 20:31:57
最后一次提交2024-03-18 13:12:37
发布数18
最新版本名称v0.5.3 (发布于 )
第一版名称v0.1.3 (发布于 )
用户参与
星数2.8k
关注者数94
派生数480
提交数556
已启用问题?
问题数319
打开的问题数79
拉请求数13
打开的拉请求数12
关闭的拉请求数19
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?