Pytorch学习记录-Seq2Seq模型实现（Seq2Seq部

2019-04-25 本文已影响3人我的昵称违规了

一点点总结
回过头看看这三天的模型，从一般LSTM Seq2Seq -> GRU Seq2Seq -> 基于注意力机制的 Seq2Seq

LSTM Seq2Seq.png

GRU Seq2Seq.png

基于注意力机制的 Seq2Seq.png

3. 三个模型的Seq2Seq部分

最后一部分的实现，seq2seq，三个模型的原理都是一样的。

接收输入/源句子
使用Encoder生成上下文向量
使用Decoder生成预测输出/目标句子

3.1 LSTM Seq2Seq

创建一个输出张量，它将存储我们所有的预测， $\hat{Y}$ 。
输入/源语句 $X$ / src输入编码器，并接收最终的隐藏和单元状态。
解码器的第一个输入是序列的开始（<sos>）令牌。由于我们的trg张量已经附加了<sos>标记（当我们在TRG字段中定义init_token时一直回来），我们通过切入它来得到 $y_1$ 。
我们知道我们的目标句子应该是多长（max_len），所以我们循环多次。在循环的每次迭代期间，我们：
- 将输入，先前隐藏和前一个单元状态（ $y_t，s_ {t-1}，c_ {t-1}$ ）传递给Decoder。
- 接收预测，来自Decoder下一个隐藏状态和下一个单元状态（ $\hat {y}_ {t + 1}，s_ {t}，c_ {t}$ ）
- 将我们的预测， $\hat {y} _ {t + 1}$ /输出放在我们的预测张量中， $\hat { Y}$ / outputs
- 决定我们是否要“教师强制”。
  - 如果我们这样做，下一个输入是序列中的groundtruth下一个标记， $y_ {t + 1}$ / trg [t]
  - 如果我们不要，下一个输入是序列中预测的下一个标记， $\hat {y} _ {t + 1}$ / top1

看一下具体实现

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super(Seq2Seq,self).__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        
        assert encoder.hid_dim == decoder.hid_dim, \
            "Hidden dimensions of encoder and decoder must be equal!"
        assert encoder.n_layers == decoder.n_layers, \
            "Encoder and decoder must have equal number of layers!"
    def forward(self, src,trg,teacher_forcing_ratio=0.5):
        # src = [src sent len, batch size]
        # trg = [trg sent len, batch size]
        # teacher_forcing_ratio是使用教师强制的概率
        # 例如。如果teacher_forcing_ratio是0.75，我们75％的时间使用groundtruth输入
        batch_size=trg.shape[1]
        max_len=trg.shape[0]
        trg_vocab_size=self.decoder.output_dim
        outputs = torch.zeros(max_len, batch_size, trg_vocab_size).to(self.device)
        hidden, cell=self.encoder(src)
        input=trg[0,:]
        for t in range(1, max_len):
            output, hidden, cell = self.decoder(input, hidden, cell)
            outputs[t] = output
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.max(1)[1]
            input = (trg[t] if teacher_force else top1)
        return outputs

3.2 GRU Seq2Seq

创建输出张量以保存所有预测， $\hat {Y}$
源序列 $X$ 被送入Encoder以接收上下文向量 $z$
初始Decoder 的隐藏状态被设置为上下文向量 $s_0 = z = h_T$
使用使用一批<sos>标记作为第一个输入， $y_1$
然后我们在一个循环中解码：
- 插入输入标记 $y_t$ ，之前的隐藏状态， $s_ {t- 1}$ ，和所有编码器输出， $H$ ，进入解码器
- 接收预测， $\ hat {y} _ {t + 1}$ ，以及一个新的隐藏状态， $s_t$
- 然后我们决定我们是否要去教师强迫与否，适当设置下一个输入

看一下具体实现

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super(Seq2Seq,self).__init__()
        self.encoder=encoder
        self.decoder=decoder
        self.device=device
        assert encoder.hid_dim == decoder.hid_dim, \
            "Hidden dimensions of encoder and decoder must be equal!"
    
    def forward(self,src,trg,teacher_forcing_ratio=0.5):
        #src = [src sent len, batch size]
        #trg = [trg sent len, batch size]
        batch_size=trg.shape[1]
        max_len=trg.shape[0]
        trg_vocab_size=self.decoder.output_dim
        
        outputs=torch.zeros(max_len,batch_size,trg_vocab_size).to(self.device)
        context=self.encoder(src)
        hidden=context
        input=trg[0,:]
        for t in range(1,max_len):
            output,hidden=self.decoder(input,hidden,context)
            outputs[t]=output
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.max(1)[1]
            input = (trg[t] if teacher_force else top1)

        return outputs

3.3 Attention Seq2Seq

创建输出张量以保存所有预测， $\hat {Y}$
源序列 $X$ 被送入Encoder 以接收 $z$ 和 $H$
初始Decoder 隐藏状态被设置为上下文vector， $s_0 = z = h_T$
使用一批<sos>标记作为第一个输入， $y_1$
然后我们在一个循环中解码：
- 插入输入标记 $y_t$ ，之前的隐藏状态， $s_ {t- 1}$ ，和所有编码器输出， $H$ ，进入解码器
- 接收预测， $\ hat {y} _ {t + 1}$ ，以及一个新的隐藏状态， $s_t$
- 然后我们决定我们是否要去教师强迫与否，适当设置下一个输入

 class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):
        
        #src = [src sent len, batch size]
        #trg = [trg sent len, batch size]
        #teacher_forcing_ratio is probability to use teacher forcing
        #e.g. if teacher_forcing_ratio is 0.75 we use teacher forcing 75% of the time
        
        batch_size = src.shape[1]
        max_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim
        
        #tensor to store decoder outputs
        outputs = torch.zeros(max_len, batch_size, trg_vocab_size).to(self.device)
        
        #encoder_outputs is all hidden states of the input sequence, back and forwards
        #hidden is the final forward and backward hidden states, passed through a linear layer
        encoder_outputs, hidden = self.encoder(src)
                
        #first input to the decoder is the <sos> tokens
        output = trg[0,:]
        
        for t in range(1, max_len):
            output, hidden = self.decoder(output, hidden, encoder_outputs)
            outputs[t] = output
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.max(1)[1]
            output = (trg[t] if teacher_force else top1)

        return outputs

这部分是最后的整合部分，但实际我看懂，仅仅作为对比拿出来。
接下来的训练和评价都是一致的。

Pytorch学习记录-Seq2Seq模型实现（Seq2Seq部

3. 三个模型的Seq2Seq部分

3.1 LSTM Seq2Seq

3.2 GRU Seq2Seq

3.3 Attention Seq2Seq

猜你喜欢

热点阅读