基因组Linux与生物信息宏基因组

QUAST评估基因组组装质量

2021-05-18  本文已影响0人  胡童远

QUAST是评估基因组组装质量的常用工具,可计算N50等contig基本信息(without reference),也可通过比对参考基因组计算fraction, duplication, misassembly, unaligned, mismatch等信息(reference-based)。之后推出的metaquast可通过与close reference比较评估宏基因组组装质量。

文章:
文章1:QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013
引用:3510
文章2:MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 2016
引用:233

方法:
主页:http://bioinf.spbau.ru/quast
github: https://github.com/ablab/quast
sourceforge: http://quast.sourceforge.net/
sourceforge: http://quast.sourceforge.net/quast
quast5 更新:QUAST v.5.1.0 release notes (public version)
quast5 github下载:quast_5.1.0rc1
quast5 manual: http://quast.sourceforge.net/docs/manual.html
metaquast主页:http://bioinf.spbau.ru/metaquast
metaqusat sourceforge:http://quast.sourceforge.net/metaquast

下载,安装:

可执行文件,免安装,爱了

wget -c https://github.com/ablab/quast/releases/download/quast_5.1.0rc1/quast-5.1.0rc1.tar.gz
tar -zxvf quast-5.1.0rc1.tar.gz
python quast.py --help
python quast.py --version
# QUAST v5.1.0rc1, 6260eff0

运行:

# 使用测试数据
python quast.py test_data/contigs_1.fasta \
           test_data/contigs_2.fasta \
        -r test_data/reference.fasta.gz \
        -g test_data/genes.txt \
        -1 test_data/reads1.fastq.gz -2 test_data/reads2.fastq.gz \
        -o quast_test_output

# 实战:有参使用QUAST
quast_route="/software/quast-5.1.0rc1"
python $quast_route/quast.py AF04-12.fna \
-r ../Prokka/bgi/AF04-12/AF04-12.fna \
-g ../Prokka/bgi/AF04-12/AF04-12.gff \
--fragmented \
-t 4 -o ./AF04-12/

# 造个轮子,批量QUAST,bgi vs illumina
for i in `cat 76_strain_id.list`;
do
    python $quast_route/quast.py Prokka/illumina/$i/$i.fna \
    -r Prokka/bgi/$i/$i.fna \
    -g Prokka/bgi/$i/$i.gff \
    --fragmented --silent \
    -t 2 -o QUAST/illumina/$i/
    echo -e "\033[32m $i done...\033[0m"
done

input contig
-r: reference fasta
-g: reference gff file
--fragmented: detect misassemblies caused and mark them fake
-1/-2: forward/reverse reads
-o: output dir

这里使用的python没有装matplotlib模块,结果无pdf,无关紧要,我们要report.txt就够了。

结果文件:

report.txt      summary table
report.tsv      tab-separated version, for parsing, or for spreadsheets (Google Docs, Excel, etc)  
report.tex      Latex version
report.pdf      PDF version, includes all tables and plots for some statistics
report.html     everything in an interactive HTML file
icarus.html     Icarus main menu with links to interactive viewers
contigs_reports/        [only if a reference genome is provided]
  misassemblies_report  detailed report on misassemblies
  unaligned_report      detailed report on unaligned and partially unaligned contigs
k_mer_stats/            [only if --k-mer-stats is specified]
  kmers_report          detailed report on k-mer-based metrics
reads_stats/            [only if reads are provided]
  reads_report          detailed report on mapped reads statistics

结果文件:report.html


see manual for more detail: http://quast.sourceforge.net/docs/manual.html

批处理,结果整理

## QUEST结果统计
task="bgi"
touch QUAST/${task}_quast.txt
cat QUAST/bgi/AF04-12/transposed_report.tsv | sed -n '1p' >> QUAST/${task}_quast.txt

for i in `cat 76_strain_id.list`;
do
    cat QUAST/${task}/$i/transposed_report.tsv | sed -n '2p' >> QUAST/${task}_quast.txt
    echo -e "\033[32m $i done... \033[0m"
done

更多:
quast 的结果怎么看

上一篇 下一篇

猜你喜欢

热点阅读