强化学习核心loss

2019-01-10  本文已影响5人  VanJordan
self.reward_loss = tf.contrib.seq2seq.sequence_loss(
      decoder_outputs_pretrain,
      self._target_batch,
      self._dec_padding_mask,
      average_across_timesteps=False,
      average_across_batch=False) * self.reward

tf.sequence_mask(
lengths,
maxlen=None,
dtype=tf.bool,
name=None
)

tf.sequence_mask([1, 3, 2], 5)  # [[True, False, False, False, False],
                                #  [True, True, True, False, False],
                                #  [True, True, False, False, False]]

tf.sequence_mask([[1, 3],[2,0]])  # [[[True, False, False],
                                  #   [True, True, True]],
                                  #  [[True, True, False],
                                  #   [False, False, False]]]
上一篇 下一篇

猜你喜欢

热点阅读