TensorFlow Serving | Advanced Co

2020-11-24 本文已影响0人 Anoyi

概述

可以通过以下几个方面配置 Serving：

Model Server Configuration：配置模型名称、路径、版本策略与标签、日志等等
Monitoring Configuration：启用并配置 Prometheus 监控
Batching Configuration: 启用并配置批处理
Misc. Flags: 一些杂七杂八的配置，如服务端口号之类的

如需了解更多可查看官方源码，常用配置如下：：

--port=8500                         int32   Port to listen on for gRPC API
--grpc_socket_path=""               string  If non-empty, listen to a UNIX socket for gRPC API on the given path. Can be either relative or absolute path.
--rest_api_port=0                   int32   Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
--rest_api_num_threads=8            int32   Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
--rest_api_timeout_in_ms=30000      int32   Timeout for HTTP/REST API calls.
--enable_batching=false             bool    enable batching
--allow_version_labels_for_unavailable_models=false bool    If true, allows assigning unused version labels to models that are not available yet.
--batching_parameters_file=""       string  If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
--model_config_file=""              string  If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
--model_config_file_poll_wait_seconds=0 int32   Interval in seconds between each poll of the filesystemfor model_config_file. If unset or set to zero, poll will be done exactly once and not periodically. Setting this to negative is reserved for testing purposes only.
--model_name="default"              string  name of model (ignored if --model_config_file flag is set)
--model_base_path=""                string  path to export (ignored if --model_config_file flag is set, otherwise required)
--max_num_load_retries=5            int32   maximum number of times it retries loading a model after the first failure, before giving up. If set to 0, a load is attempted only once. Default: 5
--load_retry_interval_micros=60000000   int64   The interval, in microseconds, between each servable load retry. If set negative, it doesn't wait. Default: 1 minute
--file_system_poll_wait_seconds=1   int32   Interval in seconds between each poll of the filesystem for new model version. If set to zero poll will be exactly done once and not periodically. Setting this to negative value will disable polling entirely causing ModelServer to indefinitely wait for a new model at startup. Negative values are reserved for testing purposes only.
--flush_filesystem_caches=true      bool    If true (the default), filesystem caches will be flushed after the initial load of all servables, and after each subsequent individual servable reload (if the number of load threads is 1). This reduces memory consumption of the model server, at the potential cost of cache misses if model files are accessed after servables are loaded.
--tensorflow_session_parallelism=0  int64   Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--tensorflow_intra_op_parallelism=0 int64   Number of threads to use to parallelize the executionof an individual op. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--tensorflow_inter_op_parallelism=0 int64   Controls the number of operators that can be executed simultaneously. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--ssl_config_file=""                string  If non-empty, read an ascii SSLConfig protobuf from the supplied file name and set up a secure gRPC channel
--platform_config_file=""           string  If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)

使用 Docker 启动 Serving 时，如需修改配置，只用在启动参数添加配置即可，示例：

docker run -d \
  -p 8501:8080 \
  -v "$TESTDATA/saved_model_half_plus_two_cpu:/models/model" \
  tensorflow/serving \
  --rest_api_port=8080

Model Server Configuration

对个单个模型，最简单的方式就是设置 --model_name 和 --model_base_path；对于多个模型，多个版本，则需要编写模型配置文件，并使用 --model_config_file 指定配置文件的位置，如果需要定期更新配置，可以使用 --model_config_file_poll_wait_seconds 设置时间间隔。

Models File 规则

默认情况下，模型路径为 /models/<模型名称>/<版本号>，其下级目录为文件 saved_model.pb、文件夹 variables、文件夹 assets，目录树如下：

models
├── model1
│   └── 1
│       ├── saved_model.pb
│       └── variables
│           ├── variables.data-00000-of-00001
│           └── variables.index
├── model2
│   └── 1
│       ├── saved_model.pb
│       └── variables
│           ├── variables.data-00000-of-00001
│           └── variables.index
├── model3
│   └── 1
│       ├── saved_model.pb
│       └── variables
│           ├── variables.data-00000-of-00001
│           └── variables.index
│   └── 2
│       ├── saved_model.pb
│       └── variables
│           ├── variables.data-00000-of-00001
│           └── variables.index

Model Config File 编写规则

模型配置文件详见 ModelConfig，多模型配置文件示例 models.config：

model_config_list: {
  config: {
    name: "model1",
    base_path: "/models/model1",
    model_platform: "tensorflow",
    model_version_policy: {
      all: {}
    }
  },
  config: {
    name: "model2",
    base_path: "/models/model2",
    model_platform: "tensorflow",
    model_version_policy: {
      latest: {
        num_versions: 1
      }
    }
  },
  config: {
    name: "model3",
    base_path: "/models/model3",
    model_platform: "tensorflow",
    model_version_policy: {
      specific: {
        versions: 1
        versions: 2
      }
    }
    version_labels {
      key: 'stable'
      value: 1
    }
    version_labels {
      key: 'canary'
      value: 2
    }
  }
}

属性详解：

name：模型名称
base_path：模型所在路径
model_platform：模型平台，默认 tensorflow
model_version_policy：模型版本策略，支持如下三种：
- all：同时服务所有版本
- latest：默认，默认值：1
- specific：服务指定版本，可同时配置多个
version_labels：模型标签，使用字符串替代版本号，标签的命名需匹配 [a-zA-Z0-9_]+

REST 使用规则

通过版本号请求

/v1/models/<model name>/versions/<version number>

通过标签请求

/v1/models/<model name>/labels/<version label>

Monitoring Configuration

开启服务监控，可以使用 --monitoring_config_file 指定配置文件，配置文件示例 monitoring.config：

prometheus_config {
  enable: true,
  path: "/monitoring/prometheus/metrics"
}

Batching Configuration

Model Server 可以通过各种设置来批量处理请求，以实现更好的吞吐量。对于服务器上的所有模型及其版本，将在全局范围内完成此批处理的调度，以确保无论服务器当前正在服务多少个模型或版本，都可以最大程度地利用基础资源。

设置 --enable_batching 标志来启用批处理，并使用 --batching_parameters_file 来指定配置文件，配置文件示例 batching.config：

max_batch_size { value: 128 }
batch_timeout_micros { value: 0 }
max_enqueued_batches { value: 1000000 }
num_batch_threads { value: 8 }

更多细节查看 TensorFlow Serving Batching Guide 。

Misc. Flags

列举一些常用的配置：

--port：gRPC 监听端口
--rest_api_port: HTTP 监听端口
--rest_api_timeout_in_ms: HTTP 请求超时时间
--file_system_poll_wait_seconds: 服务器在每个模型各自的 model_base_path 上轮询文件系统以获取新版模型的时间
--enable_model_warmup: 使用 asset.extra/ 目录中用户提供的 PredictionLogs 预热模型

参考资料

官方文档：https://tensorflow.google.cn/tfx/serving/serving_config