nanopore序列比对分析(以大肠杆菌序列示例)

2022-02-28  本文已影响0人  莫讠

Sample:E.Coli BL21(DE3);The size of the genome is about 4.5 M Gram-negative Bacterial

NCBI: https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1

DNA extraction kit:Easy Pure Bacterial Genomic DNA kit Code#EE161-01

sequencing running time: 1.5 h

Experimental details was shown in Lingfang’s report. When the sequencing step finished, We got the output data formated as fast5 file form the Nanopore Mk1C.

1、Basecalling

Because the raw date generated from the Nanopore sequencing device is formed as fast5, a kind of format used to record electrical signal, it is necessarily for us to transform this file to fastq file, which commonly used to record base information like ATCG and so on.

Guppy software:Version 5.0.16+b9fcd7b5b

guppy_basecaller -i /home/qianwj/project/ONT/lab_data -c /opt/ont/guppy/data/dna_r9.4.1_450bps_sup.cfg -s /home/qianwj/project/ONT/basecalling_gpu_sup/ -x "cuda:0" > guppy_4_gpu_sup.log

output:

image

2、Quality control

usage:

NanoPlot --fastq ~/project/ONT/basecalling_gpu_sup/pass/ -o ~/project/ONT/nanoplot/ -t8 --plots hex dot

output:

image

Quality control reports:

image

mean read length :6,541.5
mean read quality :14.5 (generally > 13)
Read length N50 :11,141.0 (generally > 10k )

According to the reports above, >Q10 is 100% means the base accuracy of all reads is higher than 90% and >Q12 =86.2 %, means about 86.2% reads’ accuracy rate higher than 93.69%

3、Alignment

Download the reference genome from NCBI:
https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1?report=genbank&to=4570938

image
minimap2 -d  ~/project/ONT/reference/BL21DE3_genome.mmi  ~/project/ONT/reference/BL21DE3_genome.fasta

minimap2 -ax map-ont ~/project/ONT/reference/BL21DE3_genome.mmi  ~/project/ONT/basecalling_gpu_sup/pass/BL21.fastq.gz > alignment.sam

conda activate base
samtools sort -@ 4 -O bam -o alignment.sorted.bam alignment.sam

samtools index alignment.sorted.bam

so according to the result, our alignment rate is 88.79%. Generally for pure microbial DNA genome, the alignment rate is more likely to higher than 90%, so possibly it may due to slightly contamination in the sample. Lingfang and I will check it later.

屏幕截图 2021-11-30 141221.png 屏幕截图 2021-11-30 150230.png

mean coverage = 50.8624X

屏幕截图 2021-11-30 153219.png 屏幕截图 2021-11-30 155734.png

Mapping quality is the confidence that the read is correctly mapped to the genomic coordinates

Mean Mapping Quality is 55.31 (generally > 30 is ok)

屏幕截图 2021-11-30 155959.png
上一篇下一篇

猜你喜欢

热点阅读