DU7:如何对小RNA测序数据进行长度分布统计

2019-04-24  本文已影响0人  纳灰灰

小RNA测序数据长度分布规律:
1、24nt>21nt>22nt>23nt
2、>60%数据在20-24nt之间

一、数据为fasta格式,首先写一个脚本统计各种长度的sRNA的数量

#!/usr/bin/env python3
import sys
import collections
inFile = open(sys.argv[1],'r')
outFile = open ('sRNA_count.csv', 'w')
lenlist = []
while True:
    line = inFile.readline()
    if not line:break
    if ">" not in line:
        line = line.rstrip()
        lenlist.append(len(line))
lenlist.sort()
lencount = collections.Counter(lenlist)
for length in lencount:
       outFile.write(str(length) + "\t" + str(lencount[length]) + "\n")
inFile.close()

#运行命令
python sRNA_count.py sample.sRNA.data.fa
sRNA_count.csv

二、手动将sRNA_count.csv进行分列加表头

sRNA_count.csv

三、用R语言ggplot2绘制直方图

install.packages('gcookbook')
library(ggplot2)
library(gcookbook)
sRNACount <- read.csv("J:/myProject/sRNA_count/sRNA_count.csv", header = TRUE)
sRNACount
ggplot(sRNACount, aes(x=sRNA, y=NUM)) + scale_y_log10() + xlim(15,30) + geom_bar(stat="identity", fill="lightblue", colour="black")
ggplot(sRNACount, aes(x=sRNA, y=NUM)) + scale_y_log10() + scale_x_continuous(limits=c(14,31),breaks=seq(15,30,1)) + geom_bar(stat="identity", fill="lightblue", colour="black")
sRNA长度分布图
上一篇下一篇

猜你喜欢

热点阅读