Motif

PFM-PWM-Seqlogo

2019-08-28  本文已影响0人  caokai001

参考1:https://sites.google.com/site/zjuwhwsblog/blog/chip-seqshujudemotiffenxizhengli
参考2:https://davetang.org/muse/2013/10/01/position-weight-matrix/

1.首先考虑如何由PFM转成PWM

image.png

采用伪计数方法【+sqrt(total)】,避免count=0,无法计算问题,输出的8个数字,表示总数为8,出现0次权重-1.936752,出现1次权重为-0.6651985,依次...

pwm <- function(freq, total, bg=0.25){
  #using the formulae above
  p <- (freq + (sqrt(total) * 1/4)) / (total + (4 * (sqrt(total) * 1/4)))
  log2(p/bg)
}
#work out all possible PWM values
for (i in 0:8){
  print(pwm(i,8))
}

[1] -1.936752
[1] -0.6651985
[1] 0
[1] 0.4535419
[1] 0.7980888
[1] 1.076008
[1] 1.308939
[1] 1.509438
[1] 1.685442
a <- c(0, 4, 4, 0, 3, 7, 4, 3, 5, 4, 2, 0, 0, 4)
c <- c(3, 0, 4, 8, 0, 0, 0, 3, 0, 0, 0, 0, 2, 4)
g <- c(2, 3, 0, 0, 0, 0, 0, 0, 1, 0, 6, 8, 5, 0)
t <- c(3, 1, 0, 0, 5, 1, 4, 2, 2, 4, 0, 0, 1, 0)
m <- matrix(data=c(a,c,g,t),nrow=4,byrow=T,dimnames=list(c('a','c','g','t')))
m

mm <- pwm(m,8)
mm
        [,1]       [,2]       [,3]      [,4]       [,5]       [,6]       [,7]       [,8]       [,9]
a -1.9367518  0.7980888  0.7980888 -1.936752  0.4535419  1.5094376  0.7980888  0.4535419  1.0760078
c  0.4535419 -1.9367518  0.7980888  1.685442 -1.9367518 -1.9367518 -1.9367518  0.4535419 -1.9367518
g  0.0000000  0.4535419 -1.9367518 -1.936752 -1.9367518 -1.9367518 -1.9367518 -1.9367518 -0.6651985
t  0.4535419 -0.6651985 -1.9367518 -1.936752  1.0760078 -0.6651985  0.7980888  0.0000000  0.0000000
       [,10]     [,11]     [,12]      [,13]      [,14]
a  0.7980888  0.000000 -1.936752 -1.9367518  0.7980888
c -1.9367518 -1.936752 -1.936752  0.0000000  0.7980888
g -1.9367518  1.308939  1.685442  1.0760078 -1.9367518
t  0.7980888 -1.936752 -1.936752 -0.6651985 -1.9367518
seq <- 'ttacataagtagtc'
x <- strsplit(x=seq,split='')


#initialise vector
seq_score <- vector()
#get the corresponding values
for (i in 1:14){
  seq_score[i] <- mm[x[[1]][i],i]
}

seq_score
[1]  0.4535419 -0.6651985  0.7980888  1.6854416  0.4535419 -0.6651985  0.7980888  0.4535419 -0.6651985  0.7980888
[11]  0.0000000  1.6854416 -0.6651985  0.7980888

sum(seq_score)
[1] 5.26307

sum(apply(mm,2,max))
[1] 14.31481

2.如何从PFM矩阵画seqlogo

library(seqLogo)
a <- c(0, 4, 4, 0, 3, 7, 4, 3, 5, 4, 2, 0, 0, 4)
c <- c(3, 0, 4, 8, 0, 0, 0, 3, 0, 0, 0, 0, 2, 4)
g <- c(2, 3, 0, 0, 0, 0, 0, 0, 1, 0, 6, 8, 5, 0)
t <- c(3, 1, 0, 0, 5, 1, 4, 2, 2, 4, 0, 0, 1, 0)
df <- data.frame(a,c,g,t)

#define function that divides the frequency by the row sum i.e. proportions
proportion <- function(x){
    rs <- sum(x);
    return(x / rs);
}


#create position weight matrix
mef2 <- apply(df, 1, proportion)
mef2 <- makePWM(mef2)
seqLogo(mef2)

image.png
上一篇下一篇

猜你喜欢

热点阅读