李宏毅机器学习:Transformer
2021-08-09 本文已影响0人
jenye_
Transformer: Sequence-to-sequence(Seq2sq)
input a sequence, output a sequence
![](https://img.haomeiwen.com/i9126620/023014ef44b10531.png)
![](https://img.haomeiwen.com/i9126620/9de6c989c771ccc2.png)
![](https://img.haomeiwen.com/i9126620/4260319585884089.png)
![](https://img.haomeiwen.com/i9126620/f128b9c034061703.png)
![](https://img.haomeiwen.com/i9126620/023014ef44b10531.png)
Seq2seq for Syntactic Parsing (文法分析)
![](https://img.haomeiwen.com/i9126620/e3279a7e0ae4f626.png)
Grammar as a Foreign Language
Seq2seq for Multi-laber Classification
Multi-class : 从多个class中选择一个
Multi-laber:一个object可以属于多个class
![](https://img.haomeiwen.com/i9126620/8e8708cf81be49dd.png)
Seq2seq for Object Detection
![](https://img.haomeiwen.com/i9126620/01f086076ddaf532.png)
Seq2Seq
最早的Seq2Seq
![](https://img.haomeiwen.com/i9126620/a3aeb66d0512938e.png)
现在的Seq2Seq
![](https://img.haomeiwen.com/i9126620/7178f00ac6dce9f0.png)
Encoder
![](https://img.haomeiwen.com/i9126620/f0cbddb919508a52.png)
Transformer 总体来说是用到了Self-attention:
![](https://img.haomeiwen.com/i9126620/f82c12e4f4392523.png)
实际上的过程会复杂一些:
![](https://img.haomeiwen.com/i9126620/181b5c636ad2604e.png)
![](https://img.haomeiwen.com/i9126620/f6c32a4d6024c093.png)
实际上这个Encoder的设计也不是最好的:
![](https://img.haomeiwen.com/i9126620/ab41362daa2612ad.png)
Decoder
![](https://img.haomeiwen.com/i9126620/f15c960be2b8d0c9.png)
其实Decoder并没有特别大的差别(如果不看中间灰色部分的话)
![](https://img.haomeiwen.com/i9126620/c43c9b2e6457f865.png)
Masked Self-attention?
![](https://img.haomeiwen.com/i9126620/9cb700f9bba46d9d.png)
![](https://img.haomeiwen.com/i9126620/d286e953430d8262.png)
Why masked?
在Decoder运作的时候,输出是一个一个产生的,所以没有办法考虑后续的信息。