小 RNA 长度分布统计
统计 长度 分布,代码,参考别人针对自己的数据做了修改。
命令行,输入:python py.jiaoben *.fa
import sys
miRNA_len={}
for i in range(0,52):
miRNA_len[i] = 0
for i in open(sys.argv[1]):
if i.startswith('>') :
continue
length = len(i) - 1
miRNA_len[length] += 1
for i in miRNA_len:
print str(i)+"\t"+str(miRNA_len[i])
数据出来之后,作图:
代码如下:参考别人,做了更改。
setwd("G:/smallRNA/length.txt")
S1 = read.table("AACC1.txt")
S2 = read.table("AACC2.txt")$V2
S3 = read.table("AACC3.txt")$V2
S4 = read.table("AACC4.txt")$V2
S5 = read.table("AACC5.txt")$V2
S6 = read.table("AACC6.txt")$V2
S7 = read.table("AACC7.txt")$V2
S8 = read.table("AACC8.txt")$V2
S9 = read.table("P1.txt")$V2
S10 = read.table("P2.txt")$V2
data = cbind(S1,S2,S3,S4,S5,S6,S7,S8,S9,S10)
colnames(data)=c("Length","AACC1","AACC2","AACC3","AACC4","AACC5","AACC6","AACC7","AACC8","P1","P2")
data$AACC1=100 * data$AACC1/sum(data$AACC1)
data$AACC2=100 * data$AACC2/sum(data$AACC2)
data$AACC3=100 * data$AACC3/sum(data$AACC3)
data$AACC4=100 * data$AACC4/sum(data$AACC4)
data$AACC5=100 * data$AACC5/sum(data$AACC5)
data$AACC6=100 * data$AACC6/sum(data$AACC6)
data$AACC7=100 * data$AACC7/sum(data$AACC7)
data$AACC8=100 * data$AACC8/sum(data$AACC8)
data$P1=100 * data$P1/sum(data$P1)
data$P2=100 * data$P2/sum(data$P2)
library(reshape2)
data.melt=melt(data, id="Length")
library(ggplot2)
p<-ggplot(data.melt, aes(x=Length, y=value, col=variable))
p+geom_line() +
theme( text =element_text(size=18),
panel.background=element_blank(),
axis.line = element_line(size = 1,colour="black"),
axis.text =element_text(colour="black")) +
labs(title="All reads length distribution",x="Read Length", y="Fraction (%)")
下一步 : 比对: