TensorFlow Serving

一个灵活、高性能的机器学习模型服务系统。「A flexible, high-performance serving system for machine learning models」

Github星跟蹤圖

TensorFlow Serving

Ubuntu Build Status
Ubuntu Build Status at TF HEAD
Docker CPU Nightly Build Status
Docker GPU Nightly Build Status


TensorFlow Serving is a flexible, high-performance serving system for
machine learning models, designed for production environments. It deals with
the inference aspect of machine learning, taking models after training and
managing their lifetimes, providing clients with versioned access via
a high-performance, reference-counted lookup table.
TensorFlow Serving provides out-of-the-box integration with TensorFlow models,
but can be easily extended to serve other types of models and data.

To note a few features:

  • Can serve multiple models, or multiple versions of the same model
    simultaneously
  • Exposes both gRPC as well as HTTP inference endpoints
  • Allows deployment of new model versions without changing any client code
  • Supports canarying new versions and A/B testing experimental models
  • Adds minimal latency to inference time due to efficient, low-overhead
    implementation
  • Features a scheduler that groups individual inference requests into batches
    for joint execution on GPU, with configurable latency controls
  • Supports many servables: Tensorflow models, embeddings, vocabularies,
    feature transformations and even non-Tensorflow-based machine learning
    models

Serve a Tensorflow model in 60 seconds

# Download the TensorFlow Serving Docker image and repo
docker pull tensorflow/serving

git clone https://github.com/tensorflow/serving
# Location of demo models
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"

# Start TensorFlow Serving container and open the REST API port
docker run -t --rm -p 8501:8501 \
    -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
    -e MODEL_NAME=half_plus_two \
    tensorflow/serving &

# Query the model using the predict API
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
    -X POST http://localhost:8501/v1/models/half_plus_two:predict

# Returns => { "predictions": [2.5, 3.0, 4.5] }

End-to-End Training & Serving Tutorial

Refer to the official Tensorflow documentations site for a complete tutorial to train and serve a Tensorflow Model.

Documentation

Set up

The easiest and most straight-forward way of using TensorFlow Serving is with
Docker images. We highly recommend this route unless you have specific needs
that are not addressed by running in a container.

Use

Export your Tensorflow model

In order to serve a Tensorflow model, simply export a SavedModel from your
Tensorflow program.
SavedModel
is a language-neutral, recoverable, hermetic serialization format that enables
higher-level systems and tools to produce, consume, and transform TensorFlow
models.

Please refer to Tensorflow documentation
for detailed instructions on how to export SavedModels.

Configure and Use Tensorflow Serving

Extend

Tensorflow Serving's architecture is highly modular. You can use some parts
individually (e.g. batch scheduling) and/or extend it to serve new use cases.

Contribute

If you'd like to contribute to TensorFlow Serving, be sure to review the
contribution guidelines.

For more information

Please refer to the official TensorFlow website for
more information.

主要指標

概覽
名稱與所有者tensorflow/serving
主編程語言C++
編程語言Python (語言數: 6)
平台
許可證Apache License 2.0
所有者活动
創建於2016-01-26 21:48:20
推送於2025-05-31 06:35:56
最后一次提交2025-05-30 23:34:56
發布數126
最新版本名稱2.19.0 (發布於 )
第一版名稱0.4.0 (發布於 )
用户参与
星數6.3k
關注者數229
派生數2.2k
提交數9.1k
已啟用問題?
問題數1478
打開的問題數57
拉請求數491
打開的拉請求數60
關閉的拉請求數238
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?