基因组组装

关于基因家族分析流程的备忘

2021-09-08  本文已影响0人  SnorkelingFan凡潜
perl /02_Cluster_stat_v1.1/bin/step3_Cluster_stat_family.pl category.txt all.cds cluster.stat-info --cluster_file all_orthomcl.out --type orthomcl --step 134 -q x.q
perl /07_orthomcl_pipeline_v1.0/bin/obtain_4d_phase1.pl all.philip
## vi step3_Cluster_stat_family.pl

this script is used for stat infomation form the result of orthomcl or treefam.

1.stat cluster infomation from cluster file .
        File require :cluster_file  category.txt.new cluster_stat_out;
        Output : cluster_stat_out;

2.stat the cluster family from the cluster_stat_out, and draw veen_svg.
        File require : category.txt.new  all.cds cluster_stat_out;
        Output : 4spec_veen.input;

3.stat the genefamilies information, such as of_gene,unique_family,single_gene.
        File require : category.txt.new all.cds cluster_stat_out;
        Output : family.stat.table;

4.filter the single_copy family from the orthomcl.out,and put the correspond cds together into the genefamily category.
then translate it to pep,run muscle.
and abstract all.philip from singlecopy genefamily
        File require : cluster_file all.cds category.txt.new;
        Output : ./singlecopy_genefamily/ ;

抽出的单拷贝同源基因家族只是用来建了树;流程得到的all.philip等所有philip文件均是来源于单拷贝同源基因家族,后面建树也就是基于这些文件,即全部都是单拷贝的。

基因家族的聚类文件cluster.stat-info包含所有家族的拷贝数,每个id即是一个基因家族.

$ tail -n 1 cluster.stat-info
26006   2   0   2   0   0   0   0   0   0   0   0   0   0   0   1

这个案例中得到的基因家族数目是26006个

上一篇 下一篇

猜你喜欢

热点阅读