组装细菌基因组

2019-12-07  本文已影响0人  千万英里

1.上Genome Announcements网站(https://mra.asm.org/)找一篇细菌基因组文章;找到文章记载的SRA号;

1 2

2.从SRA数据库上用prefetch下载该文件;

3. Fastq-dump解压,解压为gz文件,可以节省空间。因为需要点时间,我们让它在后台运行。

fastq-dump --gzip --split-files ~/ncbi/public/sra/SRR5513009.sra &

5

4. Fastqc质控

wwwww77@wwwww77-VirtualBox:~/assembly$ fastqc SRR5513009_1.fastq.gz
Started analysis of SRR5513009_1.fastq.gz
Approx 5% complete for SRR5513009_1.fastq.gz
Approx 10% complete for SRR5513009_1.fastq.gz
Approx 15% complete for SRR5513009_1.fastq.gz
Approx 20% complete for SRR5513009_1.fastq.gz
Approx 25% complete for SRR5513009_1.fastq.gz
Approx 30% complete for SRR5513009_1.fastq.gz
Approx 35% complete for SRR5513009_1.fastq.gz
Approx 40% complete for SRR5513009_1.fastq.gz
Approx 45% complete for SRR5513009_1.fastq.gz
Approx 50% complete for SRR5513009_1.fastq.gz
Approx 55% complete for SRR5513009_1.fastq.gz
Approx 60% complete for SRR5513009_1.fastq.gz
Approx 65% complete for SRR5513009_1.fastq.gz
Approx 70% complete for SRR5513009_1.fastq.gz
Approx 75% complete for SRR5513009_1.fastq.gz
Approx 80% complete for SRR5513009_1.fastq.gz
Approx 85% complete for SRR5513009_1.fastq.gz
Approx 90% complete for SRR5513009_1.fastq.gz
Approx 95% complete for SRR5513009_1.fastq.gz
Analysis complete for SRR5513009_1.fastq.gz
wwwww77@wwwww77-VirtualBox:~/assembly$ fastqc SRR5513009_2.fastq.gz
Started analysis of SRR5513009_2.fastq.gz
Approx 5% complete for SRR5513009_2.fastq.gz
Approx 10% complete for SRR5513009_2.fastq.gz
Approx 15% complete for SRR5513009_2.fastq.gz
Approx 20% complete for SRR5513009_2.fastq.gz
Approx 25% complete for SRR5513009_2.fastq.gz
Approx 30% complete for SRR5513009_2.fastq.gz
Approx 35% complete for SRR5513009_2.fastq.gz
Approx 40% complete for SRR5513009_2.fastq.gz
Approx 45% complete for SRR5513009_2.fastq.gz
Approx 50% complete for SRR5513009_2.fastq.gz
Approx 55% complete for SRR5513009_2.fastq.gz
Approx 60% complete for SRR5513009_2.fastq.gz
Approx 65% complete for SRR5513009_2.fastq.gz
Approx 70% complete for SRR5513009_2.fastq.gz
Approx 75% complete for SRR5513009_2.fastq.gz
Approx 80% complete for SRR5513009_2.fastq.gz
Approx 85% complete for SRR5513009_2.fastq.gz
Approx 90% complete for SRR5513009_2.fastq.gz
Approx 95% complete for SRR5513009_2.fastq.gz
Analysis complete for SRR5513009_2.fastq.gz

fastqc.html
SRR5513009_1.fastq.gz
SRR5513009_1.fastq.gz
SRR5513009_1.fastq.gz SRR5513009_2.fastq.gz
SRR5513009_2.fastq.gz
SRR5513009_2.fastq.gz

5.Trimmomatic去接头:

mkdir trim_out
java -jar ~/Biosofts/Trimmomatic038/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 SRR5513009_1.fastq.gz SRR5513009_2.fastq.gz ./trim_out/output_forward_paired.fq.gz ./trim_out/output_forward_unpaired.fq.gz ./trim_out/output_reverse_paired.fq.gz ./trim_out/output_reverse_unpaired.fq.gz ILLUMINACLIP:/home/wwwww77/Biosofts/Trimmomatic038/Trimmomatic-0.38/adapters/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:15 LEADING:5 TRAILING:5 MINLEN:50
Trimmomatic
trim_out

6.再次FastQC对过滤后的数据进行质量测评

wwwww77@wwwww77-VirtualBox:~/assembly$ fastqc trim_out/output_forward_paired.fq.gz
Started analysis of output_forward_paired.fq.gz
Approx 5% complete for output_forward_paired.fq.gz
Approx 10% complete for output_forward_paired.fq.gz
Approx 15% complete for output_forward_paired.fq.gz
Approx 20% complete for output_forward_paired.fq.gz
Approx 25% complete for output_forward_paired.fq.gz
Approx 30% complete for output_forward_paired.fq.gz
Approx 35% complete for output_forward_paired.fq.gz
Approx 40% complete for output_forward_paired.fq.gz
Approx 45% complete for output_forward_paired.fq.gz
Approx 50% complete for output_forward_paired.fq.gz
Approx 55% complete for output_forward_paired.fq.gz
Approx 60% complete for output_forward_paired.fq.gz
Approx 65% complete for output_forward_paired.fq.gz
Approx 70% complete for output_forward_paired.fq.gz
Approx 75% complete for output_forward_paired.fq.gz
Approx 80% complete for output_forward_paired.fq.gz
Approx 85% complete for output_forward_paired.fq.gz
Approx 90% complete for output_forward_paired.fq.gz
Approx 95% complete for output_forward_paired.fq.gz
Analysis complete for output_forward_paired.fq.gz
wwwww77@wwwww77-VirtualBox:~/assembly$ fastqc trim_out/output_reverse_paired.fq.gz
Started analysis of output_reverse_paired.fq.gz
Approx 5% complete for output_reverse_paired.fq.gz
Approx 10% complete for output_reverse_paired.fq.gz
Approx 15% complete for output_reverse_paired.fq.gz
Approx 20% complete for output_reverse_paired.fq.gz
Approx 25% complete for output_reverse_paired.fq.gz
Approx 30% complete for output_reverse_paired.fq.gz
Approx 35% complete for output_reverse_paired.fq.gz
Approx 40% complete for output_reverse_paired.fq.gz
Approx 45% complete for output_reverse_paired.fq.gz
Approx 50% complete for output_reverse_paired.fq.gz
Approx 55% complete for output_reverse_paired.fq.gz
Approx 60% complete for output_reverse_paired.fq.gz
Approx 65% complete for output_reverse_paired.fq.gz
Approx 70% complete for output_reverse_paired.fq.gz
Approx 75% complete for output_reverse_paired.fq.gz
Approx 80% complete for output_reverse_paired.fq.gz
Approx 85% complete for output_reverse_paired.fq.gz
Approx 90% complete for output_reverse_paired.fq.gz
Approx 95% complete for output_reverse_paired.fq.gz
Analysis complete for output_reverse_paired.fq.gz

7.Spades组装基因组草图:

Genome assemblies were produced with SPAdes genome assembler version 3.10 (14), set in “paired-end assembly, careful mode,”

wwwww77@wwwww77-VirtualBox:~/assembly/trim_out$ spades.py --careful --pe1-1 output_forward_paired.fq.gz --pe1-2 output_reverse_paired.fq.gz -o ./SPAdes_out
报错
内存调整

8.Quast评价组装的基因组效果

wwwww77@wwwww77-VirtualBox:~/assembly/trim_out$ quast.py SPAdes_out/contigs.fasta --min-contig 200 -o SPAdes_out/quast_out
quast quast结果·
上一篇 下一篇

猜你喜欢

热点阅读