word2vec数学推导过程
2018-06-13 本文已影响74人
LittleSasuke
Welcome To My Blog
word2vec包含两种框架,一种是CBOW(Continuous Bag-of-Words Model),另一种是Skip-gram(Continuous Skip-gram Model),如下图所示。这两种模型的任务是:进行词的预测,CBOW是预测P(w|context(w)),Skip-gram是预测P(context(w)|w)。当整个词典中所有词的预测任务整体达到最优时,此时的词向量便是我们想要的结果。
data:image/s3,"s3://crabby-images/57b8a/57b8ae1d4299d820b124ce597036f46519e5ae16" alt=""
word2vec有两种计算方式专门提升训练速度,分别是:Hierarchical Softmax 和 Negative Sampling。
本篇文章只写出有关模型的数学推导过程,其它细节可参考peghoty的word2vec 中的数学,我也是根据这篇文章学习的
Hierarchical Softmax with Continuous Bag-of-Words Model
data:image/s3,"s3://crabby-images/d97a5/d97a5e251a73166bb97716cafa8decd449728d31" alt=""
Hierarchical Softmax with Continuous Skip-gram Model
data:image/s3,"s3://crabby-images/34ead/34ead62cb3b1ee3d3032642b46b67ef73847d5e3" alt=""
Negative Sampling with Continuous Bag-of-Words Model
data:image/s3,"s3://crabby-images/376ca/376cadec3b72b9c7b5dfacf3727b2286ca25b179" alt=""
Negative Sampling with Continuous Skip-gram Model
data:image/s3,"s3://crabby-images/5860b/5860b9996c4e041eac610d3339a2641fca0171ac" alt=""
参考
Tomas Mikolov, Efficient Estimation of Word Representations in Vector Space
peghoty, word2vec 中的数学