nanopore序列比对分析(以大肠杆菌序列示例)
Sample:E.Coli BL21(DE3);The size of the genome is about 4.5 M Gram-negative Bacterial
NCBI: https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1
DNA extraction kit:Easy Pure Bacterial Genomic DNA kit Code#EE161-01
sequencing running time: 1.5 h
Experimental details was shown in Lingfang’s report. When the sequencing step finished, We got the output data formated as fast5 file form the Nanopore Mk1C.
1、Basecalling
Because the raw date generated from the Nanopore sequencing device is formed as fast5, a kind of format used to record electrical signal, it is necessarily for us to transform this file to fastq file, which commonly used to record base information like ATCG and so on.
- Bioinformatics Tool for basecalling
Guppy software:Version 5.0.16+b9fcd7b5b
- Usage:
guppy_basecaller -i /home/qianwj/project/ONT/lab_data -c /opt/ont/guppy/data/dna_r9.4.1_450bps_sup.cfg -s /home/qianwj/project/ONT/basecalling_gpu_sup/ -x "cuda:0" > guppy_4_gpu_sup.log
output:
image2、Quality control
usage:
NanoPlot --fastq ~/project/ONT/basecalling_gpu_sup/pass/ -o ~/project/ONT/nanoplot/ -t8 --plots hex dot
output:
imageQuality control reports:
imagemean read length :6,541.5
mean read quality :14.5 (generally > 13)
Read length N50 :11,141.0 (generally > 10k )
-
The distribution of read length and Average read quality
newplot _2_.png -
Accuracy
According to the reports above, >Q10 is 100% means the base accuracy of all reads is higher than 90% and >Q12 =86.2 %, means about 86.2% reads’ accuracy rate higher than 93.69%
3、Alignment
Download the reference genome from NCBI:
https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1?report=genbank&to=4570938
- Using minimap2 to build index
minimap2 -d ~/project/ONT/reference/BL21DE3_genome.mmi ~/project/ONT/reference/BL21DE3_genome.fasta
- Performing the alignment process
minimap2 -ax map-ont ~/project/ONT/reference/BL21DE3_genome.mmi ~/project/ONT/basecalling_gpu_sup/pass/BL21.fastq.gz > alignment.sam
- After alignmnet finished, using samtools to convert sam file to bam file and bulid index
conda activate base
samtools sort -@ 4 -O bam -o alignment.sorted.bam alignment.sam
samtools index alignment.sorted.bam
-
Checking the alignmnet rate
so according to the result, our alignment rate is 88.79%. Generally for pure microbial DNA genome, the alignment rate is more likely to higher than 90%, so possibly it may due to slightly contamination in the sample. Lingfang and I will check it later.
屏幕截图 2021-11-30 141221.png- visualize by using IGV
-
coverage
mean coverage = 50.8624X
屏幕截图 2021-11-30 153219.png 屏幕截图 2021-11-30 155734.png-
Mapping Quality Across Reference
Mapping quality is the confidence that the read is correctly mapped to the genomic coordinates
Mean Mapping Quality is 55.31 (generally > 30 is ok)
屏幕截图 2021-11-30 155959.png