基础知识生物信息学技能

HMM 隐马尔可夫模型初学(一)

2020-11-09  本文已影响0人  小贝学生信

笔记首先总结概括HMM相关理论知识,再通过生信序列的R语言实操对HMM深入了解

1、MM,Markov model 马尔科夫模型

(1)天气举例

(2)Markov chain马尔可夫链
(3)Transition Matrix状态转移概率矩阵!!(划重点)
(4)initial probabilities of the states

2、R代码实操:马尔科夫模型预测DNA序列

(1)已知条件

A:特殊的序列

For some DNA sequences, the probability of finding a particular nucleotide at a particular position in the sequence does depend on what nucleotides are found at adjacent positions in the sequence.
即符合马尔可夫链的假设

B:转移矩阵
TM
C:初始概率

可任意指定,例如T 0.3 C 0.3 G 0.2 A 0.2

(2)R实操

目的:主要根据转移矩阵预测可能的这样一段特殊序列

nucleotides         <- c("A", "C", "G", "T")
afterAprobs <- c(0.2, 0.3, 0.3, 0.2)         # Set the values of the probabilities, where the previous nucleotide was "A"
afterCprobs <- c(0.1, 0.41, 0.39, 0.1)       # Set the values of the probabilities, where the previous nucleotide was "C"
afterGprobs <- c(0.25, 0.25, 0.25, 0.25)     # Set the values of the probabilities, where the previous nucleotide was "G"
afterTprobs <- c(0.5, 0.17, 0.17, 0.17)      # Set the values of the probabilities, where the previous nucleotide was "T"
mytransitionmatrix <- matrix(c(afterAprobs, afterCprobs, afterGprobs, afterTprobs), 4, 4, byrow = TRUE) # Create a 4 x 4 matrix
rownames(mytransitionmatrix) <- nucleotides
colnames(mytransitionmatrix) <- nucleotides
mytransitionmatrix 
TM
generatemarkovseq <- function(transitionmatrix, initialprobs, seqlength)
{
  nucleotides     <- c("A", "C", "G", "T") # Define the alphabet of nucleotides
  mysequence      <- character()           # Create a vector for storing the new sequence
  # Choose the nucleotide for the first position in the sequence:
  firstnucleotide <- sample(nucleotides, 1, rep=TRUE, prob=initialprobs)
  mysequence[1]   <- firstnucleotide       # Store the nucleotide for the first position of the sequence
  for (i in 2:seqlength)
  {
    prevnucleotide <- mysequence[i-1]     # Get the previous nucleotide in the new sequence
    # Get the probabilities of the current nucleotide, given previous nucleotide "prevnucleotide":
    probabilities  <- transitionmatrix[prevnucleotide,]
    # Choose the nucleotide at the current position of the sequence:
    nucleotide     <- sample(nucleotides, 1, rep=TRUE, prob=probabilities)
    mysequence[i]  <- nucleotide          # Store the nucleotide for the current position of the sequence
  }
  return(mysequence)
}
myinitialprobs <- c(0.3, 0.3, 0.2, 0.2) #对应函数体设置的"A", "C", "G", "T"顺序
generatemarkovseq(mytransitionmatrix, myinitialprobs, 30)
table(generatemarkovseq(mytransitionmatrix, myinitialprobs, 30))

result

参考文章
1、Hidden Markov Models — Bioinformatics 0.1 documentation
2、01 隐马尔可夫模型 - 马尔可夫链、HMM参数和性质 - 简书
3、Multilevel HMM tutorial
4、马尔可夫链_百度百科

上一篇下一篇

猜你喜欢

热点阅读