机器学习100天Machine Learning & Recommendation & NLP & DL

Pytorch学习记录-Seq2Seq模型实现(Seq2Seq部

2019-04-25  本文已影响3人  我的昵称违规了

一点点总结
回过头看看这三天的模型,从一般LSTM Seq2Seq -> GRU Seq2Seq -> 基于注意力机制的 Seq2Seq

LSTM Seq2Seq.png
GRU Seq2Seq.png
基于注意力机制的 Seq2Seq.png

3. 三个模型的Seq2Seq部分

最后一部分的实现,seq2seq,三个模型的原理都是一样的。

3.1 LSTM Seq2Seq

看一下具体实现

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super(Seq2Seq,self).__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        
        assert encoder.hid_dim == decoder.hid_dim, \
            "Hidden dimensions of encoder and decoder must be equal!"
        assert encoder.n_layers == decoder.n_layers, \
            "Encoder and decoder must have equal number of layers!"
    def forward(self, src,trg,teacher_forcing_ratio=0.5):
        # src = [src sent len, batch size]
        # trg = [trg sent len, batch size]
        # teacher_forcing_ratio是使用教师强制的概率
        # 例如。如果teacher_forcing_ratio是0.75,我们75%的时间使用groundtruth输入
        batch_size=trg.shape[1]
        max_len=trg.shape[0]
        trg_vocab_size=self.decoder.output_dim
        outputs = torch.zeros(max_len, batch_size, trg_vocab_size).to(self.device)
        hidden, cell=self.encoder(src)
        input=trg[0,:]
        for t in range(1, max_len):
            output, hidden, cell = self.decoder(input, hidden, cell)
            outputs[t] = output
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.max(1)[1]
            input = (trg[t] if teacher_force else top1)
        return outputs

3.2 GRU Seq2Seq

看一下具体实现

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super(Seq2Seq,self).__init__()
        self.encoder=encoder
        self.decoder=decoder
        self.device=device
        assert encoder.hid_dim == decoder.hid_dim, \
            "Hidden dimensions of encoder and decoder must be equal!"
    
    def forward(self,src,trg,teacher_forcing_ratio=0.5):
        #src = [src sent len, batch size]
        #trg = [trg sent len, batch size]
        batch_size=trg.shape[1]
        max_len=trg.shape[0]
        trg_vocab_size=self.decoder.output_dim
        
        outputs=torch.zeros(max_len,batch_size,trg_vocab_size).to(self.device)
        context=self.encoder(src)
        hidden=context
        input=trg[0,:]
        for t in range(1,max_len):
            output,hidden=self.decoder(input,hidden,context)
            outputs[t]=output
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.max(1)[1]
            input = (trg[t] if teacher_force else top1)

        return outputs

3.3 Attention Seq2Seq

 class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
        
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):
        
        #src = [src sent len, batch size]
        #trg = [trg sent len, batch size]
        #teacher_forcing_ratio is probability to use teacher forcing
        #e.g. if teacher_forcing_ratio is 0.75 we use teacher forcing 75% of the time
        
        batch_size = src.shape[1]
        max_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim
        
        #tensor to store decoder outputs
        outputs = torch.zeros(max_len, batch_size, trg_vocab_size).to(self.device)
        
        #encoder_outputs is all hidden states of the input sequence, back and forwards
        #hidden is the final forward and backward hidden states, passed through a linear layer
        encoder_outputs, hidden = self.encoder(src)
                
        #first input to the decoder is the <sos> tokens
        output = trg[0,:]
        
        for t in range(1, max_len):
            output, hidden = self.decoder(output, hidden, encoder_outputs)
            outputs[t] = output
            teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.max(1)[1]
            output = (trg[t] if teacher_force else top1)

        return outputs

这部分是最后的整合部分,但实际我看懂,仅仅作为对比拿出来。
接下来的训练和评价都是一致的。

上一篇 下一篇

猜你喜欢

热点阅读