tensorflow示例 mnist_cnn

2019-04-27 本文已影响1人还有下文

0 构建CNN MNIST分类器

卷积层1：32个5-5过滤器，并应用ReLU激活函数
池化层1：2-2过滤器和步长2，执行最大池化运算
卷积层2：64个5-5过滤器，并应用ReLU激活函数
池化层2：2-2过滤器和步长2，执行最大池化运算
密集层1：1024个神经单元，dropout=0.4
密集层2（softmax）：10个神经元，对应0-9的分类
conv2d()、max_pooling2d()、dense()

def cnn_model_fn(features, labels, mode):
  """Model function for CNN."""
  
  input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

  
  conv1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=32,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)

  
  pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)


  conv2 = tf.layers.conv2d(
      inputs=pool1,
      filters=64,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)
  pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)


  pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
  dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)
  dropout = tf.layers.dropout(
      inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)


  logits = tf.layers.dense(inputs=dropout, units=10)

  predictions = {
      
      "classes": tf.argmax(input=logits, axis=1),
   
      "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
  }

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

 
  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

 
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
        loss=loss,
        global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

 
  eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

0.1 输入层

batch_size：在训练时执行梯度下降时使用的样本子集大小
image_height：样本图像的高度
image_width：样本图像的宽度
channels：样本图像中颜色通道的数量
data_format：一个字符串，含义是以上4个参数的排列顺序

reshape操作

将我们的输入特征图（features）转换为此形状

input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

其中features["x"]为训练数据
我们已经指明批次大小为 -1，表示应根据 features["x"] 中输入值的数量动态计算此维度，同时使所有其他维度的大小保持不变。这样一来，我们就可以将 batch_size 视为可调整的超参数。

0.2 卷积层1

conv1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=32,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)

其中：
inputs 参数指定输入张量，该张量的形状必须为 [batch_size, image_height, image_width, channels]
filters 参数指定要应用的过滤器数量（在此教程中为 32）
kernel_size 将过滤器的维度指定为 [height, width]
padding 枚举值（valid，same） same指的是输出张量与输入张量具有相同的高度和宽度值，即边缘加padding，一圈0
activation 指定应用于卷积输出的激活函数
conv2d 生成的输出张量的形状为 [batch_size, 28, 28, 32]：高度和宽度维度与输入相同，但现在有 32 个通道，用于保存每个过滤器的输出。

0.3 池化层1

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

像inputs等以上出现的参数不再赘述
pool_size 参数将最大池化过滤器的大小指定为 [height, width]
strides 参数指定步长的大小
以上两个参数的设定，如果设置成单个整数，表示在高度和宽度两个维度的大小相同；如果设置成两个整数，如[3,5]，则表示不同维度分别的值
max_pooling2d() (pool1) 生成的输出张量的形状为 [batch_size, 14, 14, 32]：2x2 过滤器将高度和宽度各减少 50%。

0.4 卷积层2和池化层2

conv2 = tf.layers.conv2d(
    inputs=pool1,
    filters=64,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)

pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

卷积层 2 接受的输入为第一个池化层 (pool1) 的输出张量，并生成输出张量 conv2。conv2 的形状为 [batch_size, 14, 14, 64]，高度和宽度与 pool1 相同（因为 padding="same"），并有 64 个通道，对应于应用的 64 个过滤器。

池化层 2 接受输入 conv2，并生成输出 pool2。pool2 的形状为 [batch_size, 7, 7, 64]（将 conv2 的高度和宽度各减少 50%）。

0.5 密集层

接下来，我们需要向 CNN 添加密集层（具有 1024 个神经元和 ReLU 激活函数），以对卷积/池化层提取的特征执行分类。不过，在我们连接该层之前，我们会先扁平化特征图 (pool2)，以将其变形为 [batch_size, features]，使张量只有两个维度：

pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

在上面的 reshape() 操作中，-1 表示 batch_size 维度将根据输入数据中的样本数量动态计算。每个样本都具有 7（pool2 高度）* 7（pool2 宽度）* 64（pool2 通道）个特征，因此我们希望 features 维度的值为 7 * 7 * 64（总计为 3136）。输出张量 pool2_flat 的形状为 [batch_size, 3136]。
现在，我们可以使用 layers 中的 dense() 方法连接密集层

dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

这里的激活函数使用的relu函数，几种激活函数（Sigmoid,Tanh,Relu,Leaky Relu,ELU）的对比参见：
https://www.cnblogs.com/ya-qiang/p/9258714.html
为了改善模型结果，引入dropout方法，向密集层应用丢弃正则化

dropout = tf.layers.dropout(
    inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

rate 参数指定丢弃率；在此教程中，我们使用 0.4，该值表示 40% 的元素会在训练期间被随机丢弃
training 参数采用布尔值，表示模型目前是否在训练模式下运行；只有在 training 为 True 的情况下才会执行丢弃操作。在这里，我们检查传递到模型函数 cnn_model_fn 的 mode 是否为 TRAIN 模式。
输出张量 dropout 的形状为 [batch_size, 1024]。

0.6 对数层

我们的神经网络中的最后一层是对数层，该层返回预测的原始值。我们创建一个具有 10 个神经元（介于 0 到 9 之间的每个目标类别对应一个神经元）的密集层，并应用线性激活函数（默认函数）

logits = tf.layers.dense(inputs=dropout, units=10)

CNN 的最终输出张量 logits 的形状为 [batch_size, 10]

0.7 生成预测

模型的对数层以 [batch_size, 10] 维张量中原始值的形式返回预测。我们将这些原始值转换成模型函数可以返回的两种不同格式：

每个样本的预测类别：一个介于 0 到 9 之间的数字。
每个样本属于每个可能的目标类别的概率：样本属于以下类别的概率：0、1、2 等。
对于某个给定的样本，预测的类别是对数张量中具有最高原始值的行对应的元素。可以使用tf.argmax函数

tf.argmax(input=logits, axis=1)

input 参数指定要从其中提取最大值的张量，在这里为 logits。
axis 参数指定要沿着 input 张量的哪个轴查找最大值。在这里，我们需要沿着索引为 1 的维度查找最大值，该维度对应于预测结果（已经知道对数张量的形状为 [batch_size, 10]）。
我们可以使用 tf.nn.softmax 应用 softmax 激活函数，以从对数层中得出概率：

tf.nn.softmax(logits, name="softmax_tensor")

我们将预测编译为字典，并返回** EstimatorSpec** 对象：

predictions = {
    "classes": tf.argmax(input=logits, axis=1),
    "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
}
if mode == tf.estimator.ModeKeys.PREDICT:
  return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

0.8 计算损失

损失函数：衡量模型的预测结果与目标类别之间的匹配程度。
对于像 MNIST 这样的多类别分类问题，通常将交叉熵用作损失指标。
以下代码计算模型在 TRAIN 或 EVAL 模式下运行时的交叉熵：

loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

labels 张量包含样本的预测索引列表，例如 [1, 9, ...]。
logits 包含最后一层的线性输出。
tf.losses.sparse_softmax_cross_entropy 以高效的数值稳定方式计算以上两个输入的 softmax 交叉熵（又名：类别交叉熵、负对数似然率）。

0.9 配置训练操作

配置模型以在训练期间优化该损失值。我们将学习速率设为 0.001，并将优化算法设为随机梯度下降法

if mode == tf.estimator.ModeKeys.TRAIN:
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
  train_op = optimizer.minimize(
      loss=loss,
      global_step=tf.train.get_global_step())
  return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

0.10 添加评估指标

要在模型中添加准确率指标，我们在评估模式下定义eval_metric_ops 字典。

eval_metric_ops = {
    "accuracy": tf.metrics.accuracy(
        labels=labels, predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
    mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

1 训练和评估CNN MNIST分类器

1.1 加载训练和测试数据

def main(unused_argv):
  mnist = tf.contrib.learn.datasets.load_dataset("mnist")
  train_data = mnist.train.images # Returns np.array
  train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
  eval_data = mnist.test.images # Returns np.array
  eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

1.2 创建Estimator（评估，估计）

接下来，我们为模型创建一个 Estimator（一种用于执行高级模型训练、评估和推理的 TensorFlow 类）

mnist_classifier = tf.estimator.Estimator(
    model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

model_fn 参数指定用于训练、评估和预测的模型函数；我们将在部分创建的 cnn_model_fn传递到该参数。
model_dir 参数指定要用于保存模型数据（检查点）的目录。

1.3 设置日志记录

由于 CNN 可能需要一段时间才能完成训练，因此我们设置一些日志记录，以在训练期间跟踪进度。我们可以使用 TensorFlow 的 tf.train.SessionRunHook 创建 tf.train.LoggingTensorHook，它将记录 CNN 的 softmax 层的概率值。

tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(
    tensors=tensors_to_log, every_n_iter=50)

我们将要记录的张量字典存储到 tensors_to_log 中。
每个键都是我们选择的将会显示在日志输出中的标签，而相应标签是 TensorFlow 图中 Tensor 的名称。
在这里，我们可以在 softmax_tensor（我们之前在 cnn_model_fn 中生成概率时为 softmax 操作指定的名称）中找到 probabilities。
接下来，我们创建 LoggingTensorHook，将 tensors_to_log 传递到 tensors 参数。我们设置 every_n_iter=50，指定每完成 50 个训练步之后应记录概率。

1.4 训练模型

现在，我们可以训练模型了，可以通过创建 train_input_fn 并在 mnist_classifier 上调用 train() 来完成该操作。

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": train_data},
    y=train_labels,
    batch_size=100,
    num_epochs=None,
    shuffle=True)
mnist_classifier.train(
    input_fn=train_input_fn,
    steps=20000,
    hooks=[logging_hook])

在 numpy_input_fn 调用中，我们将训练特征数据和标签分别传递到 x（作为字典）和 y。
我们将 batch_size 设置为 100（这意味着模型会在每一步训练 100 个小批次样本）。
num_epochs=None 表示模型会一直训练，直到达到指定的训练步数。
我们还设置 shuffle=True，以随机化处理训练数据。
在 train 调用中，我们设置 steps=20000（这意味着模型总共要训练 20000 步）。
为了在训练期间触发 logging_hook，我们将其传递到 hooks 参数。

1.5 评估模型

训练完成后，我们需要评估模型以确定其在 MNIST 测试集上的准确率。我们调用 evaluate 方法，该方法将评估我们在 model_fn 的 eval_metric_ops 参数中指定的指标。

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": eval_data},
    y=eval_labels,
    num_epochs=1,
    shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

要创建 eval_input_fn，我们设置 num_epochs=1，以便模型评估一个数据周期的指标，并返回结果。
我们还设置 shuffle=False 以按顺序遍历数据。

1.6 运行模型

我们已经编写了 CNN 模型函数 Estimator 和训练/评估逻辑；现在我们来看看结果。运行 cnn_mnist.py
cnn_mnist.py 详细代码参见一下链接：
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/layers/cnn_mnist.py

image.png

2 参考资源

本文参考：https://www.tensorflow.org/tutorials/estimators/cnn#building_the_cnn_mnist_classifier
高级卷积神经网络：https://www.tensorflow.org/tutorials/images/deep_cnn
Estimator介绍和详细使用：https://www.tensorflow.org/guide/custom_estimators

3 Tensorflow和Keras的对比

keras：高度封装；Tensorflow官方支持，GPU并行计算；
模块化；极简主义；易扩展性；使用python实现，易于调试
Keras 的核心数据结构是模型。模型是用来组织网络层的方式。模型有两种，一种叫
Sequential 模型，另一种叫Model 模型。Sequential 模型是一系列网络层按顺序构成的栈，是单
输入和单输出的，层与层之间只有相邻关系，是最简单的一种模型。Model 模型是用来建立更
复杂的模型的。
Tensorflow核心工作模式：定义数据流图（计算图graph）；运行数据流图。
基本概念：张量、变量、占位符、图中的节点操作
概念都在“图”的容器中完成；一个“图”代表一个计算任务；在模型运行的环节中，“图”会在会话里被启动；
https://www.jiqizhixin.com/articles/052401
计算图是纯python的，因此速度较慢；图构造是静态的，意味着图必须先被“编译”再运行。优点：开发维护；巨大、活跃的社区；网络训练的低级、高级接口；可视化套件；旨在跟踪网络拓扑和性能；C++和python编写；支持多GPU；模型编译速度比Theano的选项更快。