5. 可变序列使用pack_padded_sequence, p
3. pack_padded_sequence, pad_packed_sequence
1. 处理可变序列
当我们进行batch个训练数据一起计算的时候,我们会遇到多个训练样例长度不同的情况,这样我们就会很自然的进行padding,将短句子padding为跟最长的句子一样。
比如如下句子:
pack pad --> data但是,如Yes 后面的pad输入LSTM网络时,处理这些pad字符,会产生无效的hidden state 和 cell state使得结果变差。
因此,我们要使得LSTM处理到Yes这样字符时,不再处理pad。
LSTM pad我们使用pack_padded_sequence就能得到data一样pack pad的序列。
2. torch.nn.utils.rnn.pack_padded_sequence()
pack_padded_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)
type: (PackedSequence, bool, float, Optional[int]) -> Tuple[Tensor, Tensor]
作用:
Pads a packed batch of variable length sequences.
It is an inverse operation to pack_padded_sequence().
The returned Tensor’s data will be of size T x B x *, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into B x T x * format.
输入:
-
input (Variable) – 变长序列 被填充后的 batch
-
lengths (list[int]) –
Variable
中 每个序列的长度。 -
batch_first (bool, optional) – 如果是
True
,input的形状应该是B*T*size
。
返回值:
一个PackedSequence
对象。
## 例子
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
lens = [2, 1, 3] #每个sentence长度
#如果batch_first=True,输入是[B, L, *]形状
packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
packed
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PackedSequence(data=tensor([4, 1, 3, 5, 2, 6]), batch_sizes=tensor([3, 2, 1]),
sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))
3. torch.nn.utils.rnn.pad_packed_sequence()
跟上面的 pack_padded_sequence
是逆操作。
输入: sequence, batch_first=False, padding_value=0.0, total_length=None
输入一个填充packed_sequence
,返回一个unpack的tensor, 和 len的tensor。其中:
The returned Tensor’s data will be of size T x B x *, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into B x T x * format.
将上面pad packed 序列[4, 1, 3, 5, 2, 6] 对象用pad_packed_sequence解包:
seq_unpacked, lens_unpacked = pad_packed_sequence(packed, batch_first=True)
seq_unpacked,lens_unpacked
+++++++++++++++++++++++++++++++++++++++++
(tensor([[1, 2, 0],
[3, 0, 0],
[4, 5, 6]]),
tensor([2, 1, 3]))
实际应用:
# source_encodings: (batch_size, max_sequence_len, hidden_size * 2)
# last_state, last_cell: List[(batch_size, hidden_size * 2)]
bi_lstm = nn.LSTM(10, hidden_size=20, batch_first=True, bidirectional=True)
source_encodings, (last_state, last_cell) = bi_lstm(packed_word_embeddings)
source_encodings, _ = pad_packed_sequence(source_encodings, batch_first=True)
这时返回的last_state, last_cell
都是不过pad部分的隐藏状态和隐藏cell;然后将source_encodings通过pad_packed_sequence解压后得到不pack的tensor, 如:
tensor([[1, 2, 0], [3, 0, 0], [4, 5, 6]])
后面的_, 是长度。
Inference
[1] Pytorch中的RNN之pack_padded_sequence和pad_packed_sequence