工具 | 比较基因组 | WGDI
看我不如看【参考】
参考:
WGDI | https://github.com/SunPengChuan/wgdi
WGDI | https://wgdi.readthedocs.io/en/latest/Introduction.html
bilibili | WGDI的简单使用(一)
bilibili | WGDI的简单使用(二)
简书 | xuzhougeng | 如何用WGDI进行共线性分析(上)
简书 | xuzhougeng | 如何用WGDI进行共线性分析(中)
简书 | xuzhougeng | 如何用WGDI进行共线性分析(下)
简书 | xuzhougeng | WGDI之深入理解blockinfo输出结果
简书 | xuzhougeng | 使用jcvi基于wgdi的共线性结果(-icl)绘制点图
简书 | 生信石头 | 点阵图 | 比较基因组分析之基石 - 手把手入门教程
简书 | 生信石头 | WGDI | 比较基因组分析神器,要啥有啥!
公众号 | 生信媛 | 使用WGDI分析全基因组复制事件
1. 下载
$ conda create -c bioconda -c conda-forge -n wgdi wgdi
$ conda activate wgdi
2. 所需文件
- blast 结果
-outfmt 6
- gff
wgdi自己定义的gff,注意提取最长转录本 - lens
注意排序 - cds.fa
将所选物种序列cat
到一起 - pep.fa
3. 运行
- 生成配置文件
$ wgdi -d help >> total.conf
- 修改手动
total.conf
文件 - 结果运行
$ wgdi -d total.conf
每个参数都是【三步走】方式,运行简单
4. 逐个参数
原图地址:https://wgdi.readthedocs.io/en/latest/Introduction.html-
-d
|[dotplot]
依赖:blast 结果、gff、lens
功能:显示同源基因dotplot
-
-icl
|[collinearity]
依赖:blast 结果、gff、lens
功能:ColinearScan 的改进版,动态规划算法提取共线性
-
-ks
|[ks]
依赖:-icl
结果、pep.fa、cds.fa
功能:yn00 计算共线性块上同源基因间的ka、ks
-
-bi
|[blockinfo]
依赖:-icl
结果、ks
结果
功能:结果整合 -
-c
|[correspondence]
依赖:-bi
结果
功能:提取 -
-bk
|[blockks]
依赖:-bi
结果
功能:绘制dotplot 展示共线性块上的 Ks -
-kp
|[kspeaks]
依赖:-bi
结果
功能:ks分布图 -
-pf
|[peaksfit]
依赖:-bi
结果
功能:ks分布的高斯拟合 -
-kf
|[ksfigure]
依赖:-kp
结果
功能:ks分布图(正态)
后面古基因组和核型分析这块,我还不是很明白。
4. 直接生成全部配置文件
可以把关键信息传参导入,不用每次修改文件名了。
$ python3 createWGDIconf.py -h
usage: createWGDIconf.py [-h] [--prefix PREFIX] --blast BLAST --gff1 GFF1 [--gff2 [GFF2]] --len1 LEN1 [--len2 [LEN2]] [--genome_name1 GENOME_NAME1] [--genome_name2 GENOME_NAME2] [--cds CDS]
[--pep PEP]
OK
optional arguments:
-h, --help show this help message and exit
--prefix PREFIX
--blast BLAST
--gff1 GFF1
--gff2 [GFF2]
--len1 LEN1
--len2 [LEN2]
--genome_name1 GENOME_NAME1
--genome_name2 GENOME_NAME2
--cds CDS
--pep PEP
#!python3
import argparse
def create_parser():
parser = argparse.ArgumentParser()
parser.add_argument("--prefix", default="test")
parser.add_argument("--blast", required=True)
parser.add_argument("--gff1", required=True)
parser.add_argument("--gff2", nargs='?')
parser.add_argument("--len1", required=True)
parser.add_argument("--len2", nargs='?')
parser.add_argument("--genome_name1", default="genome1")
parser.add_argument("--genome_name2", default="genome2")
parser.add_argument("--cds", default="cds.fa")
parser.add_argument("--pep", default="pep.faa")
return parser
def create_conf():
d = """[dotplot]
blast = {blast}
gff1 = {gff1}
gff2 = {gff2}
lens1 = {len1}
lens2 = {len2}
genome1_name = {genome_name1}
genome2_name = {genome_name2}
multiple = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
blast_reverse = false
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = {prefix}.pdf
[dotplot]
blast = {blast}
gff1 = {gff1}
gff2 = {gff2}
lens1 = {len1}
lens2 = {len2}
genome1_name = {genome_name1}
genome2_name = {genome_name2}
multiple = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
blast_reverse = false
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = {prefix}.pdf
[collinearity]
gff1 = {gff1}
gff2 = {gff2}
lens1 = {len1}
lens2 = {len2}
blast = {blast}
blast_reverse = false
multiple = 1
process = 8
evalue = 1e-5
score = 100
grading = 50,25,10
mg = 25,25
pvalue = 1
repeat_number = 10
positon = order
savefile = {prefix}.icl.collinearity.txt
[ks]
cds_file = {cds}
pep_file = {pep}
align_software = muscle
pairs_file = {prefix}.icl.collinearity.txt
ks_file = {prefix}.ks
[blockinfo]
blast = {blast}
gff1 = {gff1}
gff2 = {gff2}
lens1 = {len1}
lens2 = {len2}
collinearity = {prefix}.icl.collinearity.txt
score = 100
evalue = 1e-5
repeat_number = 20
position = order
ks = {prefix}.ks
ks_col = ks_NG86
savefile = {prefix}.block.csv
[correspondence]
blockinfo = {prefix}.block.csv
lens1 = {len1}
lens2 = {len2}
tandem = false
tandem_length = 200
pvalue = 0.2
block_length = 5
multiple = 1
homo = -1,1
savefile = {prefix}.block.c.csv
[blockks]
lens1 = len1
lens2 = {len2}
genome1_name = {genome_name1}
genome2_name = {genome_name2}
blockinfo = {prefix}.block.c.csv
pvalue = 0.2
tandem = false
tandem_length = 200
markersize = 1
area = 0,2
block_length = 5
figsize = 8,8
savefig = {prefix}.ks.c.pdf
[kspeaks]
blockinfo = {prefix}.block.c.csv
pvalue = 0.2
tandem = false
block_length = 5
ks_area = 0,10
multiple = 1
homo = 0,1
fontsize = 9
area = 0,3
figsize = 10,6.18
savefig = {prefix}.kp.pdf
savefile = {prefix}.kp.medain.ks
[peaksfit]
blockinfo = {prefix}.block.c.csv
mode = median
bins_number = 200
ks_area = 0,10
fontsize = 9
area = 0,3
figsize = 10,6.18
shadow = true
savefig = {prefix}.pf.pdf
""".format(blast=args.blast,
gff1=args.gff1,
gff2=args.gff2,
len1=args.len1,
len2=args.len2,
genome_name1=args.genome_name1,
genome_name2=args.genome_name2,
prefix=args.prefix,
cds=args.cds,
pep=args.pep)
return(d)
if __name__ == "__main__":
parser = create_parser()
args = parser.parse_args()
if args.len2 is None:
args.len2=args.len1
if args.gff2 is None:
args.gff2=args.gff1
conf = create_conf()
print(conf)