PartitionFinder2的Lunix安装与使用 2021
2021-07-23 本文已影响0人
土雕艺术家
PartitionFinder官网
http://www.robertlanfear.com/partitionfinder/
点击DOWNLOAD进入github(PartitionFinder)
1.下载:
1)下载后上传服务器。
图片.png2) 直接下载到服务器
可以使用git clone 和wget命令下载
git clone https://github.com/brettc/partitionfinder.git
wget https://codeload.github.com/brettc/partitionfinder/tar.gz/refs/tags/v2.1.1
2.环境准备
图片.pngpartitionfinder使用前准备Python2环境以及依赖包。python3是不行的。
现在一般都是Python3,所以我单独创建一个环境。
#创建Python2.7的环境
conda create -n partitionfinder python=2.7
#激活该环境
source activate partitionfinder
#安装依赖包
conda install numpy pandas pyparsing scipy
pip install -U scikit-learn
pip install tables
#参考https://www.jianshu.com/p/855bda1fb2c3
3.使用测试
进入环境后,看看是否能输出help。
source activate partitionfinder
python PartitionFinder.py -h
(partitionfinder) animal1@animalia:/apps/partitionfinder-2.1.1$ python PartitionFinder.py -h
INFO | 2021-07-23 16:45:25,998 | Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not s
INFO | 2021-07-23 16:45:25,998 | NumExpr defaulting to 8 threads.
Usage: python PartitionFinder.py [options] <foldername>
PartitionFinder and PartitionFinderProtein are designed to discover optimal
partitioning schemes for nucleotide and amino acid sequence alignments.
They are also useful for finding the best model of sequence evolution for datasets.
The Input: <foldername>: the full path to a folder containing:
- A configuration file (partition_finder.cfg)
- A nucleotide/aa alignment in Phylip format
Take a look at the included 'example' folder for more details.
The Output: A file in the same directory as the .cfg file, named
'analysis' This file contains information on the best
partitioning scheme, and the best model for each partiiton
Usage Examples:
>python PartitionFinder.py example
Analyse what is in the 'example' sub-folder in the current folder.
>python PartitionFinder.py -v example
Analyse what is in the 'example' sub-folder in the current folder, but
show all the debug output
>python PartitionFinder.py -c ~/data/frogs
Check the configuration files in the folder data/frogs in the current
user's home folder.
>python PartitionFinder.py --force-restart ~/data/frogs
Deletes any data produced by the previous runs (which is in
~/data/frogs/output) and starts afresh
Options:
-h, --help show this help message and exit
-v, --verbose show debug logging information (equivalent to --debug-
out=all)
-c, --check-only just check the configuration files, don't do any
processing
-f, --force-restart delete all previous output and start afresh (!)
-p N, --processes=N Number of concurrent processes to use. Use -1 to match
the number of cpus on the machine. The default is to
use -1.
--show-python-exceptions
If errors occur, print the python exceptions
--save-phylofiles save all of the phyml or raxml output. This can take a
lot of space(!)
--dump-results Dump all results to a binary file. This is only of use
for testing purposes.
--compare-results Compare the results to previously dumped binary
results. This is only of use for testing purposes.
-q, --quick Avoid anything slow (like writing schemes at each
step),useful for very large datasets.
-r, --raxml Use RAxML (rather than PhyML) to do the analysis. See
the manual
-n, --no-ml-tree Estimate a starting tree with NJ (PhyML) or MP (RaxML)
instead of the default which is to estimate a starting
tree with ML using in RAxML. Not recommended.
--cmdline-extras=N Add additional commands to the phyml or raxml
commandlines that PF uses.This can be useful e.g. if
you want to change the accuracy of lnL calculations
('-e' option in raxml), or use multi-threaded versions
of raxml that require you to specify the number of
threads you will let raxml use ('-T' option in raxml.
E.g. you might specify this: --cmndline_extras ' -e
2.0 -T 10 ' N.B. MAKE SURE YOU PUT YOUR EXTRAS IN
QUOTES, and only use this command if you really know
what you're doing and are very familiar with raxml and
PartitionFinder
--weights=N Mainly for algorithm development. Only use it if you
know what you're doing.A list of weights to use in the
clustering algorithms. This list allows you to assign
different weights to: the overall rate for a subset,
the base/amino acid frequencies, model parameters, and
alpha value. This will affect how subsets are
clustered together. For instance: --cluster_weights
'1, 2, 5, 1', would weight the base freqeuncies 2x
more than the overall rate, the model parameters 5x
more, and the alpha parameter the same as the model
rate
--kmeans=type This defines which sitewise values to use: entropy or
tiger --kmeans entropy: use entropies for sitewise
values --kmeans tiger: use TIGER rates for sitewise
values (only valid for Morphology)
--rcluster-percent=N This defines the proportion of possible schemes that
the relaxed clustering algorithm will consider before
it stops looking. The default is 10%. e.g. --rcluster-
percent 10.0
--rcluster-max=N This defines the number of possible schemes that the
relaxed clustering algorithm will consider before it
stops looking. The default is to look at the larger
value out of 1000, and 10 times the number of data
blocks you have. e.g. --rcluster-max 1000
--min-subset-size=N This defines the minimum subset size that the kmeans
and rcluster algorithm will accept. Subsets smaller
than this will be merged at with other subsets at the
end of the algorithm (for kmeans) or at the start of
the algorithm (for rcluster). See manual for details.
The default value for kmeans is 100. The default value
for rcluster is to ignore this option. e.g. --min-
subset-size 100
--debug-output=REGION,REGION,...
(advanced option) Provide a list of debug regions to
output extra information about what the program is
doing. Possible regions are 'all' or any of {subset,su
bset_ops,raxml,parser,model_util,results,entropy,numex
pr,alignment,concurrent.futures,threadpool,numexpr.uti
ls,progress,main,config,reporter,kmeans,util,concurren
t,morph_tige,analysis_m,neighbour,scheme,submodels,dat
abase,analysis,phyml,raxml_mode,model_load,phyml_mode,
sklearn}.
--all-states In the kmeans and rcluster algorithms, this stipulates
that PartitionFinder should not produce subsets that
do not have all possible states present. E.g. for DNA
sequence data, all subsets in the final scheme must
have A, C, T, and G nucleotides present. This can
occasionally be useful for downstream analyses,
particularly concerning amino acid datasets.
--profile Output profiling information after running (this will
slow everything down!)
3.使用方法
1)准备序列矩阵文件以及配置文件
准备一个文件夹下包含phy文件和cfg。
phy是序列矩阵信息,cfg是配置文件
partition_finder.cfg文件内部,一般需要改动的就是序列矩阵文件,分区情况。其他设置可以摸索试过以后固定使用。
参考:https://bin-ye.com/post/2019/10/19/%E5%A5%BD%E5%A5%BD%E5%85%88%E7%94%9F-mrbayes-%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E/
## ALIGNMENT FILE 序列矩阵文件##
alignment = Acan.phy;
## BRANCHLENGTHS: linked | unlinked (一般)##
branchlengths = unlinked;
## MODELS OF EVOLUTION: all | allx | mrbayes | beast | gamma | gammai | <list> ##
models = mrbayes;
# MODEL SELECCTION: AIC | AICc | BIC #
model_selection = bic;
## DATA BLOCKS: see manual for how to define (分区情况)##
[data_blocks]
atp6 = 1-107;
cox1 = 108-566;
cox2 = 567-731;
cox3 = 732-870;
cytb = 871-1182;
nad1 = 1183-1382;
nad2 = 1383-1523;
nad3 = 1524-1574;
nad4L = 1575-1599;
nad4 = 1600-1818;
nad5 = 1819-2055;
nad6 = 2056-2087;
## SCHEMES, search: all | user | greedy | rcluster | rclusterf | kmeans ##
[schemes]
search = greedy;
2)运行
序列矩阵文件以及配置文件置于一文件夹下
(partitionfinder) animal1@animalia:~/Documents/20210723_MB/PartitionFinder$ l
Acan.phy partition_finder.cfg
(partitionfinder) animal1@animalia:~/Documents/20210723_MB/PartitionFinder$ cd ../
(partitionfinder) animal1@animalia:~/Documents/20210723_MB$ l
PartitionFinder/
运行方式:
python <$PartitionFinder文件路径/PartitionFinder.py> <序列矩阵文件以及配置文件的文件夹>
注意!
氨基酸序列分析使用PartitionFinderProtein.py
核苷酸序列分析使用PartitionFinder.py
我这里使用氨基酸序列进行分析。
python /apps/partitionfinder-2.1.1/PartitionFinderProtein.py full_path/PartitionFinder
主要的可用运行结果在analysis/schemes/start_scheme.txt 文件中 MrBayes 中各分区的适用模型。