TensorFlow 滑动平均模型

2020-04-25 本文已影响0人 youyuge

滑动平均不会改变训练的参数本身，原来梯度下降多少依旧是多少。滑动平均类只是会生成并维护一个影子变量的合集，每次梯度下降后运行滑动平均op，更新影子变量。影子变量比起原来的变量更平稳，故只会也只能在evaluation阶段使用影子变量，进行accuracy测试。滑动平均后的参数值不被使用于训练阶段，只会用于测试阶段。

The typical scenario for ExponentialMovingAverage is to compute moving
averages of variables during training, and restore the variables from the
computed moving averages during evaluations.

使用案例

    # 定义训练轮数及相关的滑动平均类 
    global_step = tf.Variable(0, trainable=False)
    variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
    variables_averages_op = variable_averages.apply(tf.trainable_variables())

tf.train.ExponentialMovingAverage 这一步实例化一个滑动平均类，构造函数去保存一些参数。
variable_averages.apply()这一步是重点：

The apply() method adds shadow copies of trained variables and add ops that
maintain a moving average of the trained variables in their shadow copies.
It is used when building the training model. The ops that maintain moving
averages are typically run after each training step.

def apply(self, var_list=None)
这个函数会为var_list中的每个variable创建新的shadow variable（若var_list为None，则默认会设为variables.trainable_variables()），并将原始var添加到GraphKeys.MOVING_AVERAGE_VARIABLES collection中。影子变量trainable=False，会被添加到GraphKeys.ALL_VARIABLES collection。方法返回一个op，该op一般在训练op后调用，会去更新所有影子变量。
- 源码实现方式：滑动平均类实例会维护一个self._averages = {} 字典，apply函数会遍历var_list, 为每个var创建新的影子变量avg，并写入字典：self._averages[var] = avg。之后组个构建一个更新所有影子变量的op list叫updates，最后return control_flow_ops.group(*updates)。

with tf.control_dependencies([opt_op]):
    # Create the shadow variables, and add ops to maintain moving averages
    # of var0 and var1. This also creates an op that will update the moving
    # averages after each training step.  This is what we will use in place
    # of the usual training op.
    training_op = ema.apply([var0, var1])

...train the model by running training_op...

之后需要把更新shadow variable的op纳入到训练op中，上面是官网的方式。下方这样也可以：

# 反向传播更新参数和更新每一个参数的滑动平均值
with tf.control_dependencies([train_step, variables_averages_op]):
   train_op = tf.no_op(name='train')

...train the model by running training_op...

tf.no_op代表什么也不干。tf.control_dependencies作为上下文管理器，会确保参数op一定在with语句块执行之前被调用。实现了tf中对Graph模型里流的控制。

TensorFlow 滑动平均模型

使用案例

猜你喜欢

热点阅读