
2018-06-06  本文已影响665人  美环花子若野

Let’s say you have a batch of two examples, one is of length 13, and the other of length 20. Each one is a vector of 128 numbers. The length 13 example is 0-padded to length 20. Then your RNN input tensor is of shape [2, 20, 128]. The dynamic_rnn function returns a tuple of (outputs, state), where outputs is a tensor of size [2, 20, ...] with the last dimension being the RNN output at each time step. state is the last state for each example, and it’s a tensor of size [2, ...] where the last dimension also depends on what kind of RNN cell you’re using.


So, here’s the problem: Once your reach time step 13, your first example in the batch is already “done” and you don’t want to perform any additional calculation on it. The second example isn’t and must go through the RNN until step 20. By passing sequence_length=[13,20] you tell Tensorflow to stop calculations for example 1 at step 13 and simply copy the state from time step 13 to the end. The output will be set to 0 for all time steps past 13. You’ve just saved some computational cost. But more importantly, if you didn’t pass sequence_length you would get incorrect results! Without passing sequence_length, Tensorflow will continue calculating the state until T=20 instead of simply copying the state from T=13. This means you would calculate the state using the padded elements, which is not what you want.http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/

# Create input data

X =np.random.randn(2, 10, 8)

# The second example is of length 6

X[1,6:] =0

X_lengths =[10, 6]

cell =tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)

outputs, last_states =tf.nn.dynamic_rnn(





result =tf.contrib.learn.run_n(

    {"outputs": outputs, "last_states": last_states},



assertresult[0]["outputs"].shape ==(2, 10, 64)

# Outputs for the second example past past length 6 should be 0

assert(result[0]["outputs"][1,7,:] ==np.zeros(cell.output_size)).all()

Bidirectional RNNs

When using a standard RNN to make predictions we are only taking the “past” into account. For certain tasks this makes sense (e.g. predicting the next word), but for some tasks it would be useful to take both the past and the future into account. Think of a tagging task, like part-of-speech tagging, where we want to assign a tag to each word in a sentence. Here we already know the full sequence of words, and for each word we want to take not only the words to the left (past) but also the words to the right (future) into account when making a prediction. Bidirectional RNNs do exactly that. A bidirectional RNN is a combination of two RNNs – one runs forward from “left to right” and one runs backward from “right to left”. These are commonly used for tagging tasks, or when we want to embed a sequence into a fixed-length vector (beyond the scope of this post).

Just like for standard RNNs, Tensorflow has static and dynamic versions of the bidirectional RNN. As of the time of this writing, the bidirectional_dynamic_rnn is still undocumented, but it’s preferred over the static bidirectional_rnn.

The key differences of the bidirectional RNN functions are that they take a separate cell argument for both the forward and backward RNN, and that they return separate outputs and states for both the forward and backward RNN:

X =np.random.randn(2, 10, 8)

X[1,6,:] =0

X_lengths =[10, 6]

cell =tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)

outputs, states  =tf.nn.bidirectional_dynamic_rnn(






output_fw, output_bw =outputs

states_fw, states_bw =states

RNN Cells, Wrappers and Multi-Layer RNNs

Check out the Jupyter Notebook on RNN Cells here!

All Tensorflow RNN functions take a cell argument. LSTMs and GRUs are the most commonly used cells, but there are many others, and not all of them are documented. Currently, the best way to get a sense of what cells are available is to look at at rnn_cell.py and contrib/rnn_cell.

As of the time of this writing, the basic RNN cells and wrappers are:

BasicRNNCell – A vanilla RNN cell.

GRUCell – A Gated Recurrent Unit cell.

BasicLSTMCell – An LSTM cell based on Recurrent Neural Network Regularization. No peephole connection or cell clipping.

LSTMCell – A more complex LSTM cell that allows for optional peephole connections and cell clipping.

MultiRNNCell – A wrapper to combine multiple cells into a multi-layer cell.

DropoutWrapper – A wrapper to add dropout to input and/or output connections of a cell.

and the contributed RNN cells and wrappers:

CoupledInputForgetGateLSTMCell – An extended LSTMCell that has coupled input and forget gates based on LSTM: A Search Space Odyssey.

TimeFreqLSTMCell – Time-Frequency LSTM cell based on Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

GridLSTMCell– The cell from Grid Long Short-Term Memory.

AttentionCellWrapper – Adds attention to an existing RNN cell, based on Long Short-Term Memory-Networks for Machine Reading.

LSTMBlockCell – A faster version of the basic LSTM cell (Note: this one is in lstm_ops.py)

Using these wrappers and cells is simple, e.g.

cell =tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)

cell =tf.nn.rnn_cell.DropoutWrapper(cell=cell, output_keep_prob=0.5)

cell =tf.nn.rnn_cell.MultiRNNCell(cells=[cell] *4, state_is_tuple=True)

Calculating sequence loss on padded examples

Check out the Jupyter Notebook on Calculating Loss here!

For sequence prediction tasks we often want to make a prediction at each time step. For example, in Language Modeling we try to predict the next word for each word in a sentence. If all of your sequences are of the same length you can use Tensorflow’s sequence_loss and sequence_loss_by_example functions (undocumented) to calculate the standard cross-entropy loss.

However, as of the time of this writing sequence_loss does not support variable-length sequences (like the ones you get from a dynamic_rnn). Naively calculating the loss at each time step doesn’t work because that would take into account the padded positions. The solution is to create a weight matrix that “masks out” the losses at padded positions.

Here you can see why 0-padding can be a problem when you also have a “0-class”. If that’s the case you cannotuse tf.sign(tf.to_float(y)) to create a mask because that would mask out the “0-class” as well. You can still create a mask using the sequence length information, it just becomes more complicated.

# Batch size

B =4

# (Maximum) number of time steps in this batch

T =8

RNN_DIM =128


# The *acutal* length of the examples

example_len =[1, 2, 3, 8]

# The classes of the examples at each step (between 1 and 9, 0 means padding)

y =np.random.randint(1, 10, [B, T])

fori, length inenumerate(example_len):

    y[i, length:] =0

# The RNN outputs

rnn_outputs =tf.convert_to_tensor(np.random.randn(B, T, RNN_DIM), dtype=tf.float32)

# Output layer weights

W =tf.get_variable(



    shape=[RNN_DIM, NUM_CLASSES])

# Calculate logits and probs

# Reshape so we can calculate them all at once

rnn_outputs_flat =tf.reshape(rnn_outputs, [-1, RNN_DIM])

logits_flat =tf.batch_matmul(rnn_outputs_flat, W)

probs_flat =tf.nn.softmax(logits_flat)

# Calculate the losses

y_flat =tf.reshape(y, [-1])

losses =tf.nn.sparse_softmax_cross_entropy_with_logits(logits_flat, y_flat)

# Mask the losses

mask =tf.sign(tf.to_float(y_flat))

masked_losses =mask *losses

# Bring back to [B, T] shape

masked_losses =tf.reshape(masked_losses,  tf.shape(y))

# Calculate mean loss

mean_loss_by_example =tf.reduce_sum(masked_losses, reduction_indices=1) /example_len

mean_loss =tf.reduce_mean(mean_loss_by_example)



Post navigation

Previous PostPREVIOUSDeep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow

Next PostNEXTLearning Reinforcement Learning (with Code, Exercises and Solutions)




Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Email Address


Introduction to Learning to Trade with Reinforcement Learning

AI and Deep Learning in 2017 – A Year in Review

Hype or Not? Some Perspective on OpenAI’s DotA 2 Bot

Learning Reinforcement Learning (with Code, Exercises and Solutions)

RNNs in Tensorflow, a Practical Guide and Undocumented Features

Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow

Deep Learning for Chatbots, Part 1 – Introduction

Attention and Memory in Deep Learning and NLP


February 2018

December 2017

August 2017

October 2016

August 2016

July 2016

April 2016

January 2016

December 2015

November 2015

October 2015

September 2015


Conversational Agents

Convolutional Neural Networks

Deep Learning


Language Modeling


Neural Networks



Recurrent Neural Networks

Reinforcement Learning






Log in

Entries RSS

Comments RSS


