生物信息学习生物信息学与算法微生物信息学

细菌基因组:质粒序列的鉴定之PlasmidFinder

2019-08-12  本文已影响11人  基因的生物信息学分析
image

细菌基因组测序完成后,想知道里面有没有质粒怎么办?

质粒(plasmid) 广泛存在于生物界,从细菌、放线菌、丝状真菌、大型真菌、酵母到植物,甚至人类机体中都含有。从分子组成看,有DNA 质粒,也有RNA 质粒; 从分子构型看,有线型质粒、也有环状质粒: 其表型也多种多样。细菌质粒是基因工程中最常用的载体。

质粒是细菌酵母菌放线菌等生物中染色体(或拟核)以外的DNA分子,存在于细胞质中(但酵母除外,酵母的2 μm质粒存在于细胞核中),具有自主复制能力,使其在子代细胞中也能保持恒定的拷贝数,并表达所携带的遗传信息,是闭合环状的双链DNA分子。质粒不是细菌生长繁殖所必需的物质,可自行丢失或人工处理而消除,如高温、紫外线等。质粒携带的遗传信息能赋予宿主菌某些生物学性状,有利于细菌在特定的环境条件下生存。

与细菌基因组相同,质粒也属于环形双链DNA(共价闭环DNA,covalenr closed circular DNA, cccDNA)。

PlasmidFinder介绍

从细菌基因组测序数据中鉴定出质粒序列。基于一个人工校对的质粒复制子数据库。

也有在线版本:https://cge.cbs.dtu.dk/services/PlasmidFinder/

image

不需要安装直接上传序列即可快速得到结果

PlasmidFinder软件安装

git clone https://bitbucket.org/genomicepidemiology/plasmidfinder.git
cd plasmidfinder

下载和安装PlasmidFinder数据库

# Clone database from git repository (develop branch)
git clone https://bitbucket.org/genomicepidemiology/plasmidfinder_db.git
cd plasmidfinder_db
PLASMID_DB=$(pwd)
# Install PlasmidFinder database with executable kma_index program
python3 INSTALL.py kma_index

如果kma_index 没有安装可以参考

(https://bitbucket.org/genomicepidemiology/kma)

git clone https://bitbucket.org/genomicepidemiology/kma.git
cd kma && make

PlasmidFinder软件使用:

查看帮助文档

$ python3 plasmidfinder.py  -h   
usage: plasmidfinder.py [-h] [-i INFILE [INFILE ...]] [-o OUTDIR]
                        [-tmp TMP_DIR] [-mp METHOD_PATH] [-p DB_PATH]
                        [-d DATABASES] [-l MIN_COV] [-t THRESHOLD] [-x] [-q]

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE [INFILE ...], --infile INFILE [INFILE ...]
                        FASTA or FASTQ input files.
  -o OUTDIR, --outputPath OUTDIR
                        Path to blast output
  -tmp TMP_DIR, --tmp_dir TMP_DIR
                        Temporary directory for storage of the results from
                        the external software.
  -mp METHOD_PATH, --methodPath METHOD_PATH
                        Path to method to use (kma or blastn)
  -p DB_PATH, --databasePath DB_PATH
                        Path to the databases
  -d DATABASES, --databases DATABASES
                        Databases chosen to search in - if non is specified
                        all is used
  -l MIN_COV, --mincov MIN_COV
                        Minimum coverage
  -t THRESHOLD, --threshold THRESHOLD
                        Minimum threshold for identity
  -x, --extented_output
                        Give extented output with allignment files, template
                        and query hits in fasta and a tab seperated file with
                        allele profile results
  -q, --quiet

运行命令:

$ python3 plasmidfinder.py -i test/test.fsa -o testout/ -p plasmidfinder_db -x

查看结果文件夹:

$ ls testout
data.json  Hit_in_genome_seq.fsa  Plasmid_seqs.fsa  results_tab.tsv  results.txt  tmp
$ more testout/results.txt
plasmidfinder Results

Organism(s): Enterobacteriaceae,Gram Positive

****************************************************************************************
Enterobacteriaceae
**********************************************************************************************************************************
Plasmid         Identity  Query / Template length    Contig                       Position in contig    Note    Accession number
**********************************************************************************************************************************
IncHI1B(R27)         100  540 / 540                  IncHI1B(R27)_1_R27_AF250878  1..540                R27     AF250878
==================================================================================================================================


****************************************************************************************
Gram Positive
****************************************************************************************************************
Plasmid    Identity    Query / Template length    Contig        Position in contig    Note    Accession number
****************************************************************************************************************
-          -           -                          No hit found  -                     -       -
================================================================================================================




Extended Output:

# IncHI1B(R27)_AF250878
template:   ATTCCAGAAAACCGATCTCTTTAAGCTGGCCCAGCGCCTTTTTAACCGTGGCATTCTGGT
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      ATTCCAGAAAACCGATCTCTTTAAGCTGGCCCAGCGCCTTTTTAACCGTGGCATTCTGGT

template:   TACCGAGGTGTGATGACAGTTGGAGTCGTCCACGAAGCCGATCGAATCCGATGCGGTAAA
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      TACCGAGGTGTGATGACAGTTGGAGTCGTCCACGAAGCCGATCGAATCCGATGCGGTAAA

template:   AGGTGCTCGGCAGCTCAGCCAGATACAGGTACAGGGCCTGTGCGGACTCCTTACGGGCCA
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      AGGTGCTCGGCAGCTCAGCCAGATACAGGTACAGGGCCTGTGCGGACTCCTTACGGGCCA

template:   GTTTTTGCAATGTCTTCAGGTAGAGTCGGGTTTTACCGTCGACGCGATACAGCGTATTGA
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      GTTTTTGCAATGTCTTCAGGTAGAGTCGGGTTTTACCGTCGACGCGATACAGCGTATTGA

template:   GCTTCGAATTTGGCTTGATGATGATTTTTCCCGTGGAACTGTCGTAATACGTCGATTCCA
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      GCTTCGAATTTGGCTTGATGATGATTTTTCCCGTGGAACTGTCGTAATACGTCGATTCCA

template:   CCAGGTGCATGTTTATCGTTATCTGATCATCTGTACCGGGTATTTTCTTAATAAATGAAA
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      CCAGGTGCATGTTTATCGTTATCTGATCATCTGTACCGGGTATTTTCTTAATAAATGAAA

template:   TGTTGGTCCGGGCTATACGCGTCAGCGAAGCATCAAAGCGCTCTTTCAGTTGTTTATCAA
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      TGTTGGTCCGGGCTATACGCGTCAGCGAAGCATCAAAGCGCTCTTTCAGTTGTTTATCAA

template:   TGCGCTTGGTATCAAACCCACAAAATTTTGCAAACTCCGGAAAATTCAGCTCCAGCTGAC
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      TGCGCTTGGTATCAAACCCACAAAATTTTGCAAACTCCGGAAAATTCAGCTCCAGCTGAC

template:   CTTCTGAATCAAGCGGCCGGTTAGACAACGCATAAACGATCCCACACCATGATTTGAAAT
            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query:      CTTCTGAATCAAGCGGCCGGTTAGACAACGCATAAACGATCCCACACCATGATTTGAAAT

参考:PlasmidFinder and pMLST: in silico detection and typing of plasmids. Carattoli A, Zankari E, Garcia-Fernandez A, Volby Larsen M, Lund O, Villa L, Aarestrup FM, Hasman H. Antimicrob. Agents Chemother. 2014. April 28th.
感谢您的阅读,欢迎点赞、评论和转发!!

扫描或长按下方二维码,即可关注公众号: 基因的生物信息学分析

image

相关阅读

细菌基因组:结核杆菌测序耐药位点分析

一文搞定细菌基因组De Novo测序分析

肠道菌群:16S测序分析流程解读

肠道菌群:宏基因组测序分析流程解读(上)

上一篇下一篇

猜你喜欢

热点阅读