软件安装生信入门

Conda 安装 Busco

2022-01-09  本文已影响0人  ayunga
Busco

BUSCO - Benchmarking Universal Single-Copy Orthologs
一款使用python语言编写的对转录组和基因组组装质量进行评估的软件。
在相近的物种之间总有一些保守的序列,而BUSCO就是使用这些保守序列与组装的结果进行比对,鉴定组装的结果是否包含这些序列,包含单条、多条还是部分或者不包含等等情况来给出结果。

安装方法

可以用conda安装,也可以手动安装。手动安装需要额外安装augustus,blast,hmmer这些软件,然后再配置,比较繁琐。

使用conda安装

(1) 建一个新的conda环境,环境里装一个python3.7

$ conda create -n busco-py3.7 python=3.7
$ conda activate busco-py3.7

(2) 安装augustus

$ conda install -c bioconda augustus

(3) 安装hmmer

$ conda install -c bioconda hmmer

(4) 安装busco

$ conda install -c bioconda busco

(5) 安装biopython1.77

$ conda install -c bioconda biopython=1.77
测试
$ busco -h
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

Welcome to BUSCO 4.1.2: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide.

optional arguments:
  -i FASTA FILE, --in FASTA FILE
                        Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.
  -c N, --cpu N         Specify the number (N=integer) of threads/cores to use.
  -o OUTPUT, --out OUTPUT
                        Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
  --out_path OUTPUT_PATH
                        Optional location for results folder, excluding results folder name. Default is current working directory.
  -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
  -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.
                        There are three valid modes:
                        - geno or genome, for genome assemblies (DNA)
                        - tran or transcriptome, for transcriptome assemblies (DNA)
                        - prot or proteins, for annotated gene sets (protein)
  -l LINEAGE, --lineage_dataset LINEAGE
                        Specify the name of the BUSCO lineage to be used.
  -f, --force           Force rewriting of existing files. Must be used when output files with the provided name already exist.
  -r, --restart         Continue a run that had already partially completed.
  --limit REGION_LIMIT  How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
  --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
  -q, --quiet           Disable the info logs, displays only errors
  --augustus_parameters AUGUSTUS_PARAMETERS
                        Pass additional arguments to Augustus. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. '--param1=1,--param2=2'
  --augustus_species AUGUSTUS_SPECIES
                        Specify a species for Augustus training.
  --auto-lineage        Run auto-lineage to find optimum lineage path
  --auto-lineage-prok   Run auto-lineage just on non-eukaryote trees to find optimum lineage path
  --auto-lineage-euk    Run auto-placement just on eukaryote tree to find optimum lineage path
  --update-data         Download and replace with last versions all lineages datasets and files necessary to their automated selection
  --offline             To indicate that BUSCO cannot attempt to download files
  --config CONFIG_FILE  Provide a config file
  -v, --version         Show this version and exit
  -h, --help            Show this help message and exit
  --list-datasets       Print the list of available BUSCO datasets
(busco-py3.7)
安装最新的
conda create -n busco5.2.2 -c conda-forge -c bioconda busco=5.2.2
用户手册

https://busco.ezlab.org/busco_userguide.html

上一篇 下一篇

猜你喜欢

热点阅读