Note 5: BERT

2020-07-12  本文已影响0人  qin7zhen

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018)

Fig. 1 Devlin et al., (2018)

  1. BERT (Bidirectional Encoder Representations from Transformers) is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

2. Two-steps Framework

3. Input/Output Representations

4. Pre-training

5. Fine-tuning BERT

6. BERT vs. GPT vs. ELMo


Reference

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.

上一篇 下一篇

猜你喜欢

热点阅读