OrthoMCL鉴定直系同源基因

2021-04-28 本文已影响0人 DumplingLucky

OrthoMCL是目前最常用的基因家族分析软件，是一款直系同源基因聚类软件，它不仅能得到多个物种共有的直系同源基因，还能够分别获得不同物种特有基因家族的扩张情况。
什么是直系同源和旁系同源基因？

1. 软件下载

#下载及解压
wget http://orthomcl.org/common/downloads/software/v2.0/orthomclSoftware-v2.0.9.tar.gz
tar zxvf orthomclSoftware-v2.0.9.tar.gz
#添加环境变量
vi ~/.bash_profile
export PATH=/root/software/orthomclSoftware-v2.0.9/bin:$PATH

2. 配置环境

运行软件需要(1) UNIX系统；(2) BLAST；(3) 相关数据库：Oracle或MySql；(4) Perl：标准perl或DBI库；(5) MCL

BLAST安装

conda install blast

MySql安装及配置
帮助手册

#安装
yum install -y mysql-server mysql mysql-devel
service mysqld start   #开启服务
mysqladmin -u root password '******' #创建管理员账号和密码
service mysqld restart
mysql -u root -p  #检查登陆
#利用OrthoMCL提供的config文件进行编译
mysql -u root -p   #登陆数据库超级用户
mysql> CREATE DATABASE orthomcl;   #创建database, 我的路径为 /var/lib/mysql
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,CREATE VIEW,CREATE, INDEX, DROP on orthomcl.* TO orthomcl@localhost;  #新建跑OrthoMCL的账号
mysql> set password for orthomcl@localhost = password('yourpassword'); #设置密码

cd /home/scr/02_software/orthomclSoftware-v2.0.9/doc/OrthoMCLEngine/Main
cp mysql.cnf my.cnf
vi my.cnf

#只留下以下部分, 其他全部注释掉
[client]
[mysqld]
myisam_sort_buffer_size=4G
myisam_max_sort_file_size=200G
read_buffer_size=2G

mysql --defaults-file=my.cnf -u orthomcl -p #登陆成功即配置完成

Perl配置

#检查是否有DBI and DBD::mysql modules
perl -MDBI -e 1
perl -MDBD::mysql -e 1
perl -MCPAN -e shell
cpan> o conf makepl_arg "mysql_config=/path_to_your_mysql_dir/bin/mysql_config"
cpan> install Data::Dumper
cpan> install DBI
cpan> force install DBD::mysql

MCL安装

conda install mcl

3. OrthoMCL使用

整个过程需要13个步骤。

（1）配置数据库（见上文）

（2）下载mcl（见上文）

（3）下载安装OrthoMCL软件

mkdir orthomcl  #创建自己的工作目录
cd orthomcl
cp ~/02_software/orthomclSoftware-v2.0.9/doc/OrthoMCLEngine/Main/orthomcl.config.template .out_01/00.orthomcl.config 
vi 00.orthomcl.config

# this config assumes a mysql database named 'orthomcl'.  adjust according
# to your situation.
dbVendor=mysql
dbConnectString=dbi:mysql:orthomcl:localhost:3307 #设置你使用的数据库和hostname及其使用端口，默认是3307;
dbLogin=orthomcl
dbPassword=5201314
similarSequencesTable=SimilarSequences_new   #以下五项可以修改的
orthologTable=Ortholog_new 
inParalogTable=InParalog_new
coOrthologTable=CoOrtholog_new
interTaxonMatchView=InterTaxonMatch_new
percentMatchCutoff=50
evalueExponentCutoff=-5
oracleIndexTblSpc=NONE

（4）orthomclInstallSchema

#将上一步设置的模型提交给 database 
orthomclInstallSchema out_01/00.orthomcl.config out_01/install_tables.log

（5）orthomclAdjustFasta

#处理 fasta 格式文件
orthomclAdjustFasta pdel out_02/pdel.pep.fa 1  
#argv[1] 表示修改后每条序列的开头名称及文件名, argv[2]表示取原始序列名称的第一部分作为名称, 两个名称之间用'|'连接
#将要做同源分析的物种逐个处理

（6）orthomclFilterFasta

（7）All-v-all BLAST

#blastp, 最好时间, 可以拆分一下再比对
makeblastdb -in out_03/goodProteins.fasta -dbtype prot -out out_04/orthomcl
blastp -query out_03/goodProteins.fa -out out_04/orthomcl_blastp.out -db out_04/orthomcl -evalue 1e-5 -num_threads 5

（8）orthomclBlastParser

#处理比对结果, 用于提交给orthomcl database
orthomclBlastParser out_04/orthomcl_blastp.out out_02 >> out05/similarSequences.txt
#要根据结果文件大小修改my.cnf中参数

（9）orthomclLoadBlast

#提交给orthomcl database
orthomclLoadBlast out_01/00.orthomcl.config out_05/ilarSequences.txt

（10）orthomclPairs

#这一步是主要的计算环节, 用于找到配对的蛋白
orthomclPairs out_01/00.orthomcl.config out_07/orthomcl_pairs.log cleanup=no

（11）orthomclDumpPairsFiles

#获得ortholog, coortholog, inparalog文件
orthomclDumpPairsFiles out_01/00.orthomcl.config

（12）mcl

#马尔科夫模型聚类算法软件
mcl mclInput --abc -I 1.5 -o out_09/mclOutput

（13）orthomclMclToGroups

#输出聚类之后的结果文件
orthomclMclToGroups all_ 1 < out_09/mclOutput > out10/all.txt

参考：
https://www.jianshu.com/p/10600dfec426
https://orthomcl.org/common/downloads/software/v2.0/UserGuide.txt