计算基因序列中ATCG的含量以及碱基motif的绘制
2019-03-06 本文已影响1人
热衷组培的二货潜
1、ATCG含量计算
- 来源我在学习Modern Statistics for Modern Biology 此文中的《Modern Statistics for Modern Biology》Chapter 二: 统计建模(2.4-2.5)
读取fasta序列文件
setwd("e://compute language/NGS/Modern Statistics for Modern Biology/data/")
> library(Biostrings)
> staph = readDNAStringSet("staphsequence.ffn.txt", "fasta") ## 读取fasta文件
fastat文件格式如下
>a
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT
CAACTTTCCTAAAAGATACTGAGCTTTACACGATCAAAGATGGTGAAGCTATCGTATTATCGAGTATTCC
TTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATTATCCAAGCAATCTTATTTGATGTTGTAGGC
TATGAAGTAAAACCTCACTTTATTACTACTGAAGAATTAGCAAATTATAGTAATAATGAAACTGCTACTC
CAAAAGAAGCAACAAAACCTTCTACTGAAACAACTGAGGATAATCATGTGCTTGGTAGAGAGCAATTCAA
单个基因计算
> staph[1]
A DNAStringSet instance of length 1
width seq names
[1] 1362 ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAA...TAGAGAATCTTGAAAAAGAAATAAGAAATGTATAA lcl|NC_002952.2_c...
> letterFrequency(staph[[1]], letters = "ACTG", OR = 0)
A C T G
522 219 392 229
多个基因计算
- 这里活用了
apply
函数,更多参考搜索或者参考这个 掌握R语言中的apply函数族
> letterFrq = vapply(staph, letterFrequency, FUN.VALUE = numeric(4), letters = "ACGT", OR = 0)
> colnames(letterFrq) = paste0("gene", seq(along = staph))
> computerProportions = function(x){x/sum(x)}
> prop10 = apply(tab10, 2, computerProportions) ## 2表示列
> round(prop10, digits = 2)
gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9 gene10
A 0.38 0.36 0.35 0.37 0.35 0.33 0.33 0.34 0.38 0.27
C 0.16 0.16 0.13 0.15 0.15 0.15 0.16 0.16 0.14 0.16
G 0.17 0.17 0.23 0.19 0.22 0.22 0.20 0.21 0.20 0.20
T 0.29 0.31 0.30 0.29 0.27 0.30 0.30 0.29 0.28 0.36
> p0 = rowMeans(prop10)
> p0
A C G T
0.3470531 0.1518313 0.2011442 0.2999714
2、motif绘制
- 来源我在学习Modern Statistics for Modern Biology 此文中的《Modern Statistics for Modern Biology》Chapter 二 统计建模(2.6 - 2.8)
> setwd("e://compute language/NGS/Modern Statistics for Modern Biology/data/")
> library("seqLogo")
载入需要的程辑包:grid
Warning message:
程辑包‘seqLogo’是用R版本3.5.1 来建造的
> load("kozak.RData")
> kozak
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
A 0.33 0.25 0.4 0.15 0.20 1 0 0 0.05
C 0.12 0.25 0.1 0.40 0.40 0 0 0 0.05
G 0.33 0.25 0.4 0.20 0.25 0 0 1 0.90
T 0.22 0.25 0.1 0.25 0.15 0 1 0 0.00
> pwm = makePWM(kozak)
> seqLogo(pwm, ic.scale = FALSE)