NLP

自然语言处理——5.2 语言模型(参数估计)

2018-10-03  本文已影响20人  SpareNoEfforts

两个重要概念:

最大似然估计求法

对于n-gram,参数p({\omega _i}|\omega _{i - n + 1}^{i - 1})可由最大似然估计求得:
p({\omega _i}|\omega _{i - n + 1}^{i - 1}) = f({\omega _i}|\omega _{i - n + 1}^{i - 1}) = \frac{{c(\omega _{i - n + 1}^i)}}{{\sum\nolimits_{{\omega _i}} {c(\omega _{i - n + 1}^i)} }}
其中,{\sum\nolimits_{{\omega _i}} {c(\omega _{i - n + 1}^i)} }是历史串\omega _{i - n + 1}^{i - 1}在给定语料中出现的次数,即{c(\omega _{i - n + 1}^{i - 1})},不管\omega _i是什么。

f({\omega _i}|\omega _{i - n + 1}^{i - 1})是在给定\omega _{i - n + 1}^{i - 1}的条件下\omega _i出现的相对频度,分子为\omega _{i - n + 1}^{i - 1}\omega _i同出现的次数。

举例

例如,给定训练语料:
“John read Moby Dick”,
“Mary read a different book”,
“She read a book by Cher”
根据 2 元文法求句子的概率?
解答:
p(John| < BOS > ) = \frac{{c( < BOS > John)}}{{\sum\nolimits_{{\omega _i}} {c( < BOS > \omega )} }} = \frac{1}{3}

p(read|John) = \frac{{c(John{\text{ read}})}}{{\sum\nolimits_{{\omega _i}} {c(John{\text{ }}\omega )} }} = \frac{1}{1}

p(a|read) = \frac{{c({\text{read a}})}}{{\sum\nolimits_{{\omega _i}} {c(read{\text{ }}\omega )} }} = \frac{2}{3}

p(book|a) = \frac{{c({\text{a book}})}}{{\sum\nolimits_{{\omega _i}} {c(a{\text{ }}\omega )} }} = \frac{1}{2}

p( < EOS > |book) = \frac{{c({\text{book }} < EOS > )}}{{\sum\nolimits_{{\omega _i}} {c(book{\text{ }}\omega )} }} = \frac{1}{2}

p(John{\text{ read a book}}) = \frac{1}{3} \times 1 \times \frac{2}{3} \times \frac{1}{2} \times \frac{1}{2} \approx 0.06

提出问题

p(Cher{\text{ read a book}})=?

如下所示:
p(Cher|<BOS>) \times p(read|Cher) \times p(a|read) \times p(book|a) \times p(<EOS>|book)

因为:p(Cher| < BOS > ) = \frac{{c( < BOS > Cher)}}{{\sum\nolimits_\omega {c( < BOS > \omega )} }} = \frac{0}{3}

于是:p(Cher{\text{ read a book}})=0


数据匮乏(稀疏) (Sparse Data) 引起零概率问题,如何解决?

数据平滑(data smoothing)

上一篇下一篇

猜你喜欢

热点阅读