/serving

A high-performance serving system for DeepRec based on TensorFlow Serving.

Primary LanguageC++Apache License 2.0Apache-2.0

DeepRec Serving

DeepRec Serving is a high-performance serving system for DeepRec based on TensorFlow Serving. DeepRec Serving could highly improve performance and cpu/gpu utilization in inference, such as SessionGroup, CUDA multi-stream, etc.

Few features in DeepRec Serving:

  • Support SessionGroup which is shared-variable (only variables shared) architecture for mutliple session in serving process.
  • Support CUDA Multiple Stream, could highly improve QPS and GPU Utilization in GPU Inference.

Installation

Prepare for build

CPU Dev Docker

GCC Version Python Version IMAGE
9.4.0 3.8.10 alideeprec/deeprec-build:deeprec-dev-cpu-py38-ubuntu20.04

GPU(cuda11.6) Dev Docker

GCC Version Python Version CUDA VERSION IMAGE
9.4.0 3.8.10 CUDA 11.6.2 alideeprec/deeprec-build:deeprec-dev-gpu-py38-cu116-ubuntu20.04

Build from source

Develop Branch: master, Latest Release Branch: deeprec2302

Build Package Builder-CPU

bazel build -c opt tensorflow_serving/...

Build CPU Package Builder with OneDNN + Eigen Threadpool

bazel build  -c opt --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true tensorflow_serving/...

Build Package Builder-GPU

bazel build -c opt --config=cuda tensorflow_serving/...

Build Package

bazel-bin/tensorflow_serving/tools/pip_package/build_pip_package /tmp/tf_serving_client_whl

Server Bin

Server Bin would generated in following directory:

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server

More details