比较基因组从入门到放弃（4）

2021-06-17 本文已影响0人 Morriyaty

今儿来讲一哈利用Paml计算收到正选择的基因
因为很多的基因 Ka/Ks 都要小于1 我们用P值来定义是否受到正选择

软件用到的是singularity 的paml 容器

#准备树文件
#tree
9  1
(cattle,(horse,((gorilla,human),(rabbit,(hamster,(cahirinus #1,(mouse,rat)))))));
#notree
9  1
(cattle,(horse,((gorilla,human),(rabbit,(hamster,(cahirinus,(mouse,rat)))))));

#批量生成命令行
python3 run.py og.list run.sh
python3 norun.py og.list norun.sh

codeml.ctl:
seqfile = test.phy * sequence data filename
     treefile = tree      * tree structure file name
      outfile = mlc           * main result file name

        noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1  * 0: concise; 1: detailed, 2: too much
      runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
                   * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

      seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
    CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table

*        ndata = 10
        clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
       aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
   aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
                   * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

        model = 2
                   * models for codons:
                       * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
                   * models for AAs or codon-translated AAs:
                       * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
                       * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)

      NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
                   * 5:gamma;6:2gamma;7:beta;8:beta&w;9:beta&gamma;
                   * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
                   * 13:3normal>0
icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
        Mgene = 0
                   * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff
                   * AA: 0:rates, 1:separate

    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
        kappa = 2  * initial or fixed kappa
    fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate 
        omega = 1  * initial or fixed omega, for codons or codon-based AAs

    fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
        alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)
       Malpha = 0  * different alphas for genes
        ncatG = 8  * # of categories in dG of NSsites models

        getSE = 0  * 0: don't want them, 1: want S.E.s of estimates
 RateAncestor = 1  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

   Small_Diff = .5e-6
    cleandata = 1  * remove sites with ambiguity data (1:yes, 0:no)?
*  fix_blength = 1  * 0: ignore, -1: random, 1: initial, 2: fixed, 3: proportional
       method = 0  * Optimization method 0: simultaneous; 1: one branch a time

* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt., 
* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt., 
* 10: blepharisma nu.
* These codes correspond to transl_table 1 to 11 of GENEBANK.

model = 0 即为nocodeml.ctl
注：输入文件名记得改 不然会覆盖掉

#开始处理结果
python3 run1.py og.list run1.sh
sh run1.sh
python3 run2.py og.list run2.sh
sh run2.sh
cat align/*/Pout > allPout
python3 p.py allPout out.r
Rscript out.r > pvalue
paste allPout pvalue  | awk '{print$1,$5}' | awk 'IF$2<0.01{print$0}' > p001.list
cat p001.list | sort -n -k2 -r > sort.pvalue
#对P value 进行校正后选取 p < 0.01 的基因家族

之后就是找到其对应的基因名，查看其功能了。
在不操心

比较基因组从入门到放弃（4）

猜你喜欢

热点阅读