Transformer -> Bert

2021-08-03 本文已影响0人 poteman

思维导图

组件1：

attention layer

组件2：

self-attention
Transformer 模型搭建：

从single head 转化为multi-head
encoder网络主要由self-attention构成：

encoder网络

decoder网络：

decoder网络

transformer模型
Bert(Bidirectional Encoder Representations from Transformers)：
预训练transformer网络的encoder部分，
Bert的任务1：predict the masked word，随机掩盖句子中的词，让网络预测被mask掉的词；
任务2：预测两句话是否相邻

bert的任务2：预测两句话是否相邻
combining the two tasks:

数据样本格式

训练过程
bert的好处：数据无需人工标记，可以利用海量数据；

【参考文献】：

https://www.youtube.com/watch?v=aButdUV0dxI&list=PLvOO0btloRntpSWSxFbwPIjIum3Ub4GSC
课件：https://github.com/wangshusen/DeepLearning
论文: Transformer(attention is alll you need)

Transformer -> Bert

猜你喜欢

热点阅读