Conda 安装 Busco
2022-01-09 本文已影响0人
ayunga
Busco
BUSCO - Benchmarking Universal Single-Copy Orthologs
一款使用python语言编写的对转录组和基因组组装质量进行评估的软件。
在相近的物种之间总有一些保守的序列,而BUSCO就是使用这些保守序列与组装的结果进行比对,鉴定组装的结果是否包含这些序列,包含单条、多条还是部分或者不包含等等情况来给出结果。
安装方法
可以用conda安装,也可以手动安装。手动安装需要额外安装augustus,blast,hmmer这些软件,然后再配置,比较繁琐。
使用conda安装
(1) 建一个新的conda环境,环境里装一个python3.7
$ conda create -n busco-py3.7 python=3.7
$ conda activate busco-py3.7
(2) 安装augustus
$ conda install -c bioconda augustus
(3) 安装hmmer
$ conda install -c bioconda hmmer
(4) 安装busco
$ conda install -c bioconda busco
(5) 安装biopython1.77
$ conda install -c bioconda biopython=1.77
测试
$ busco -h
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
Welcome to BUSCO 4.1.2: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide.
optional arguments:
-i FASTA FILE, --in FASTA FILE
Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.
-c N, --cpu N Specify the number (N=integer) of threads/cores to use.
-o OUTPUT, --out OUTPUT
Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
--out_path OUTPUT_PATH
Optional location for results folder, excluding results folder name. Default is current working directory.
-e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
-m MODE, --mode MODE Specify which BUSCO analysis mode to run.
There are three valid modes:
- geno or genome, for genome assemblies (DNA)
- tran or transcriptome, for transcriptome assemblies (DNA)
- prot or proteins, for annotated gene sets (protein)
-l LINEAGE, --lineage_dataset LINEAGE
Specify the name of the BUSCO lineage to be used.
-f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist.
-r, --restart Continue a run that had already partially completed.
--limit REGION_LIMIT How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
--long Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
-q, --quiet Disable the info logs, displays only errors
--augustus_parameters AUGUSTUS_PARAMETERS
Pass additional arguments to Augustus. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. '--param1=1,--param2=2'
--augustus_species AUGUSTUS_SPECIES
Specify a species for Augustus training.
--auto-lineage Run auto-lineage to find optimum lineage path
--auto-lineage-prok Run auto-lineage just on non-eukaryote trees to find optimum lineage path
--auto-lineage-euk Run auto-placement just on eukaryote tree to find optimum lineage path
--update-data Download and replace with last versions all lineages datasets and files necessary to their automated selection
--offline To indicate that BUSCO cannot attempt to download files
--config CONFIG_FILE Provide a config file
-v, --version Show this version and exit
-h, --help Show this help message and exit
--list-datasets Print the list of available BUSCO datasets
(busco-py3.7)
安装最新的
conda create -n busco5.2.2 -c conda-forge -c bioconda busco=5.2.2