将qualimap结果统计成表格
2021-02-13 本文已影响0人
wangsb_2020
qualimap用于将RNA比对数据进行QC,运行命令:
ls *bam | xargs -n 1 -P 2 -I{} qualimap rnaseq -bam {} -gtf At.gtf -oc count.matrix -outdir rnaseq_result_{} -pe -outformat PDF:HTML
获得很多结果
image.png
从结果中提取统计数据,保存到Excel表中。首先获得文件路径,保存在file_name.txt文件中;
find ./ -name rnaseq_qc_results.txt > file_name.txt
使用Python脚本统计结果
import pandas as pd
def get_info(file):
f = open(file, 'r')
info = f.read().split('\n')
Name = info[5].split(' ')[-1].replace('.bam', '')
TotalAlignments = info[15].split(' ')[-1].replace(',','')
ReadPairsAligned = info[14].split(' ')[-1].replace(',','')
SecondaryAlignments = info[16].split(' ')[-1].replace(',','')
AlignedToGenes = info[18].split(' ')[-1].replace(',','')
AlignedToExonic = ' '.join(info[27].split(' ')[-2:]).replace(',','')
AlignedToIntronic = ' '.join(info[28].split(' ')[-2:]).replace(',','')
AlignedToIntergenic = ' '.join(info[29].split(' ')[-2:]).replace(',','')
df = pd.DataFrame({'Name': [Name], 'TotalAlignments': [TotalAlignments], 'ReadPairsAligned': [ReadPairsAligned],
'SecondaryAlignments': [SecondaryAlignments], 'AlignedToGenes': [AlignedToGenes],
'AlignedToExonic': [AlignedToExonic], 'AlignedToIntronic': [AlignedToIntronic],
'AlignedToIntergenic': [AlignedToIntergenic]})
return df
data = pd.DataFrame()
files = open('file_name.txt', 'r')
for file in files.readlines():
file = file.strip('\n')
data = data.append([get_info(file)])
data.set_index(['Name'], inplace=True)
data.to_excel('Summary_of_mapping_reads_of_the_RNA-seq.xlsx')
最后得到下面效果
image.png