RNN起名器（二）——RNN基础

2017-02-03 本文已影响299人 Cer_ml

这篇博客介绍RNN基础。

具体关于RNN的细节介绍和实现，推荐wildml的这个系列博客，一共四篇，带你由浅入深学习RNN和Theano实现。补充：看到知乎上有人把这个系列翻译出来了，可以看看。

另外，Karpathy（char-rnn的作者）的博客讲了更多关于RNN和语言模型的知识。

这里我简单总结下。

1. RNN

这里说的RNN特指循环神经网络（Recurrent Neural Network），先看下面这张图：

先看左边没有展开的图，若输入x直接到输出o，那就是一个普通的神经元；RNN就是在原来的基础上，增加了一个中间状态s，然后这个中间状态会受到上一个时间点（序列中前一个）中间状态的影响。

按时间展开这个RNN单元，就可以获得右边的图，U，V，W是唯一的。

用数学语言描述上面的图：

下面给出theano实现：

# 1.先定义单个时间序列下的计算规则
def forward_prop_step(x_t, s_t_prev, U, V, W):
    s_t = T.tanh(U[:, x_t] + W.dot(s_t_prev))
    o_t = T.nnet.softmax(V.dot(s_t))
    return [o_t[0], s_t]
# 2.对输入序列x，计算隐藏状态序列s，和输出序列o；thean中的scan相当于for循环。
[o, s], updates = theano.scan(
    forward_prop_step,
    sequences=x,
    outputs_info=[None, dict(initial=T.zeros(self.hidden_dim))],
    non_sequences=[U, V, W],
    truncate_gradient=self.bptt_truncate,
    strict=True)

预测和损失函数：

prediction = T.argmax(o, axis=1)
o_error = T.sum(T.nnet.categorical_crossentropy(o, y))

2. GRU（门限递归单元）

普通的RNN会有梯度消失的问题，即对于较长的输入序列，会出现梯度为0的状况。

于是产生了GRU（门限递归单元）和LSTM（长短项记忆）。

GRU：

Theano实现：

# 获得embedding
x_e = E[:, x_t]

# GRU 
z_t = T.nnet.hard_sigmoid(U[0].dot(x_e) + W[0].dot(s_t_prev) + b[0])
r_t = T.nnet.hard_sigmoid(U[1].dot(x_e) + W[1].dot(s_t_prev) + b[1])
h_t = T.tanh(U[2].dot(x_e) + W[2].dot(s_t_prev * r_t) + b[2])
s_t = (T.ones_like(z_t) - z_t) * h_t + z_t * s_t_prev

3. LSTM（长短项记忆）

继续介绍LSTM：

Tensorflow实现：

参数定义：

# Parameters:
  # Input gate: input, previous output, and bias.
  ix = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
  im = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
  ib = tf.Variable(tf.zeros([1, num_nodes]))
  # Forget gate: input, previous output, and bias.
  fx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
  fm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
  fb = tf.Variable(tf.zeros([1, num_nodes]))
  # Memory cell: input, state and bias.                             
  cx = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
  cm = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
  cb = tf.Variable(tf.zeros([1, num_nodes]))
  # Output gate: input, previous output, and bias.
  ox = tf.Variable(tf.truncated_normal([vocabulary_size, num_nodes], -0.1, 0.1))
  om = tf.Variable(tf.truncated_normal([num_nodes, num_nodes], -0.1, 0.1))
  ob = tf.Variable(tf.zeros([1, num_nodes]))
  # Variables saving state across unrollings.
  saved_output = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
  saved_state = tf.Variable(tf.zeros([batch_size, num_nodes]), trainable=False)
  # Classifier weights and biases.
  w = tf.Variable(tf.truncated_normal([num_nodes, vocabulary_size], -0.1, 0.1))
  b = tf.Variable(tf.zeros([vocabulary_size]))

计算规则：

def lstm_cell(i, o, state):
    """Create a LSTM cell. See e.g.: http://arxiv.org/pdf/1402.1128v1.pdf
    Note that in this formulation, we omit the various connections between the
    previous state and the gates."""
    input_gate = tf.sigmoid(tf.matmul(i, ix) + tf.matmul(o, im) + ib)
    forget_gate = tf.sigmoid(tf.matmul(i, fx) + tf.matmul(o, fm) + fb)
    update = tf.matmul(i, cx) + tf.matmul(o, cm) + cb
    state = forget_gate * state + input_gate * tf.tanh(update)
    output_gate = tf.sigmoid(tf.matmul(i, ox) + tf.matmul(o, om) + ob)
    return output_gate * tf.tanh(state), state

4. 一些细节

符号∘代表矩阵的Element-wise product，也称为Hadamard product。

在theano和tensorflow中，*直接表示Element-wise product。

而theano中的.dot和tensorflow中的tf.matmul，表示矩阵乘法。

RNN起名器（二）——RNN基础

1. RNN

2. GRU（门限递归单元）

3. LSTM（长短项记忆）

4. 一些细节

猜你喜欢

热点阅读