基因组

wtdbg2

2019-07-03  本文已影响0人  tobebettergirl

学习网址
https://github.com/ruanjue/wtdbg2

Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT).

使用于三代数据的长片段组装

It assembles raw reads without error correction and then builds the consensus from intermediate assembly output.

不会产生错误的组装

Wtdbg2 is able to assemble the human and even the 32Gb Axolotl genome at a speed tens of times faster than CANU and FALCON while producing contigs of comparable base accuracy.

相比较canu 和 falcon , 速度较快。

下载命令
git clone https://github.com/ruanjue/wtdbg2
cd wtdbg2
make
数据下载
wget -t 200 http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz
后续学习
git clone https://github.com/ruanjue/wtdbg2
cd wtdbg2 && make
#quick start with wtdbg2.pl
./wtdbg2.pl -t 16 -x rs -g 4.6m -o dbg reads.fa.gz
# Step by step commandlines
# assemble long reads
./wtdbg2 -x rs -g 4.6m -i reads.fa.gz -t 16 -fo dbg

# derive consensus
./wtpoa-cns -t 16 -i dbg.ctg.lay.gz -fo dbg.raw.fa

# polish consensus, not necessary if you want to polish the assemblies using other tools
minimap2 -t16 -ax map-pb -r2k dbg.raw.fa reads.fa.gz | samtools sort -@4 >dbg.bam
samtools view -F0x900 dbg.bam | ./wtpoa-cns -t 16 -d dbg.raw.fa -i - -fo dbg.cns.fa

# Addtional polishment using short reads
bwa mem -t 16 dbg.cns.fa sr.1.fa sr.2.fa | samtools sort -O SAM | ./wtpoa-cns -t 16 -x sam-sr -d dbg.cns.fa -i - -fo dbg.srp.fa

-g is the estimated genome size ;
-x specifies the sequencing technology;

image.png
文献:https://www.biorxiv.org/content/biorxiv/early/2019/01/26/530972.full.pdf
局限

For Nanopore data, wtdbg2 may produce an assembly smaller than the true genome.

组装的结果,比实际的基因组要小

When inputing multiple files of both fasta and fastq format, please put fastq first, then fasta. Otherwise, program cannot find '>' in fastq, and append all fastq in one read.

输入文件的格式

上一篇 下一篇

猜你喜欢

热点阅读