transformer内部细节

2023-02-26 本文已影响0人 Cipolee

transformer decoder里的K和V为什么要用encoder输出的K和V
image.png
In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. This allows every position in the decoder to attend over all positions in the input sequence. This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models such as 38

作者：Mr.g
链接：https://www.zhihu.com/question/458687952/answer/1878623992
来源：知乎
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。