真菌基因组

Prodigal预测基因

2021-10-11  本文已影响0人  胡童远

Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm),原核的动态编程基因查找算法.

文献信息

文章:Prodigal: prokaryotic gene recognition and translation initiation site identification
杂志:BMC Bioinformatics
时间:2010
引用:5325 (谷歌学术2021.11.8)

Github: https://github.com/hyattpd/prodigal/wiki

获取软件

conda install -c bioconda prodigal
prodigal --help

很多软件都是依赖prodigal的,所以启动相应的环境即可,
比如:prokka, DRAM, checkv, BUSCO
都在用version 2.6.3

基因预测

prodigal \
-a ./prodigal/out.faa \
-d ./prodigal/out.fna \
-f gff \
-g 11 \
-o ./prodigal/out.gff \
-p single \
-s ./prodigal/out.stat \
-i ./checkv/output_sop/combined.fna
# 仅翻译蛋白
prodigal \
-a ./prodigal/out.faa \
-p single \
-i ./checkv/output_sop/combined.fna

参数:
不支持gz压缩文件
-a 蛋白文件
-d 基因文件
-f 输出格式 gbk, gff, or sco
-g 详细列出翻译表,默认11
-o 详细列出输出文件
-p 程序模式 single or meta
-s 输出潜在基因和打分
-i 输入文件gbk/fna

运行过程

-------------------------------------
PRODIGAL v2.6.3 [February, 2016]
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.
-------------------------------------
Request:  Single Genome, Phase:  Training
Reading in the sequence(s) to train...

Warning:  ideally Prodigal should be given at least 100000 bases for training.
You may get better results with the -p meta option.

94857 bp seq created, 46.41 pct GC
Locating all potential starts and stops...4486 nodes
Looking for GC bias in different frames...frame bias scores: 1.73 0.36 0.91
Building initial set of genes to train from...done!
Creating coding model and scoring nodes...done!
Examining upstream regions and training starts...done!
-------------------------------------
Request:  Single Genome, Phase:  Gene Finding
Finding genes in sequence #1 (41946 bp)...done!
Finding genes in sequence #2 (13124 bp)...done!
Finding genes in sequence #3 (12850 bp)...done!
Finding genes in sequence #4 (7790 bp)...done!
Finding genes in sequence #5 (7426 bp)...done!
Finding genes in sequence #6 (5283 bp)...done!
Finding genes in sequence #7 (2539 bp)...done!
Finding genes in sequence #8 (2178 bp)...done!
Finding genes in sequence #9 (1613 bp)...done!

结果


蛋白文件faa
基因文件fna
基因位置gff
潜在基因和打分stat

更多:
生物信息百Jia软件(六):prodigal
有关氨基酸密码子表,教科书还漏了许多重点
The Genetic Codes
病毒与宿主的密码子偏好性 —— 很像,但不能太像
有关氨基酸密码子表,教科书还漏了许多重点
Unconventional viral gene expression mechanisms as therapeutic targets. nature review 2021

上一篇下一篇

猜你喜欢

热点阅读