2. NLP with Probabilistic Models

2020-12-03  本文已影响0人  Kevin不会创作

Table of Contents

Autocorrect

if word not in vocab:
    misspelled = True

Minimum edit distance

Part of Speech Tagging and Hidden Markov Models

Markov Chains

A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

You can use a table called transition matrix to store the states and transition probabilities. Each row in the matrix represents transition probabilities of one state to all other states. Notice that for all the odds going transition probabilities from a given state, the sum of these transition probabilities should always be one. To assign a part of speech tag to the first word in the sentence, you can introduce what is known as an initial state by you include these probabilities in the table.

Transition Matrix

Hidden Markov Model

The name hidden Markov model implies that states are hidden or not directly observable. Going back to the Markov model that has the states for the parts of speech, such as noun, verb, or other, you can now think of these as hidden states because these are not directly observable from the text data.

The hidden Markov model also has additional probabilities known as emission probabilities. These describe the transition from the hidden states of your hidden Markov model, which are parts of speech seen here as circles for noun, verb, and other, to the observables or the words of your corpus.

Emission Matrix

Calculation

Viterbi Algorithm

If you're given a sentence like "Why not learn something?", what is the most likely sequence of parts of speech tags given the sentence in your model? The sequence can be computed using the Viterbi algorithm.

The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM).

Viterbi Algorithm

Autocomplete and Language Models

N-grams Language Model

上一篇 下一篇

猜你喜欢

热点阅读