生物信息

将qualimap结果统计成表格

2021-02-13  本文已影响0人  wangsb_2020

qualimap用于将RNA比对数据进行QC,运行命令:

ls *bam | xargs -n 1 -P 2 -I{} qualimap rnaseq -bam {} -gtf At.gtf -oc count.matrix -outdir rnaseq_result_{} -pe -outformat PDF:HTML 

获得很多结果


image.png

从结果中提取统计数据,保存到Excel表中。首先获得文件路径,保存在file_name.txt文件中;

find ./ -name rnaseq_qc_results.txt > file_name.txt

使用Python脚本统计结果

import pandas as pd
def get_info(file):
    f = open(file, 'r')
    info = f.read().split('\n')
    Name = info[5].split(' ')[-1].replace('.bam', '')
    TotalAlignments = info[15].split(' ')[-1].replace(',','')
    ReadPairsAligned = info[14].split(' ')[-1].replace(',','')
    SecondaryAlignments = info[16].split(' ')[-1].replace(',','')
    AlignedToGenes = info[18].split(' ')[-1].replace(',','')
    AlignedToExonic = ' '.join(info[27].split(' ')[-2:]).replace(',','')
    AlignedToIntronic = ' '.join(info[28].split(' ')[-2:]).replace(',','')
    AlignedToIntergenic = ' '.join(info[29].split(' ')[-2:]).replace(',','')
    df = pd.DataFrame({'Name': [Name], 'TotalAlignments': [TotalAlignments], 'ReadPairsAligned': [ReadPairsAligned],
                       'SecondaryAlignments': [SecondaryAlignments], 'AlignedToGenes': [AlignedToGenes],
                       'AlignedToExonic': [AlignedToExonic], 'AlignedToIntronic': [AlignedToIntronic],
                       'AlignedToIntergenic': [AlignedToIntergenic]})
    return df

data = pd.DataFrame()
files = open('file_name.txt', 'r')
for file in files.readlines():
    file = file.strip('\n')
    data = data.append([get_info(file)])

data.set_index(['Name'], inplace=True)
data.to_excel('Summary_of_mapping_reads_of_the_RNA-seq.xlsx')

最后得到下面效果


image.png
上一篇下一篇

猜你喜欢

热点阅读