Docker Busco V4使用过程2020-09-14

2020-09-14 本文已影响0人土雕艺术家

Docker Busco安装

使用此命令可以查看当前docker下有哪些容器可以使用。
sudo docker images
Busco的dcoker库，里面有各个版本的busco
https://hub.docker.com/r/ezlabgva/busco/tags
使用下面命令自动就会安装好

docker pull ezlabgva/busco:v4.1.0_cv2

命令运行

1.帮助文档

我尝试用下面的命令进行运行
进入以后我输入busco，发现交互式的操作好像行不通（我这步就没有展示了）

@animalia:~$ sudo docker run -it ezlabgva/busco:v4.1.0_cv1
******
The BUSCO Docker container is based on the biocontainer image (https://biocontainers.pro/). Here are a few tips.
Do not use the root account. The default is biodocker (uid=1000), but you can specify your user id.
You need
to use mounts (-v) to exchange files between the host filesystem, on which your user can write, and the container filesystem.
`/busco_wd` is the default location in the container where inputs, outputs, and
downloaded datasets are read and written. It is the default working directory when running the container.

Run BUSCO as follows:
`docker run -u $(id -u) -v
$(pwd):/busco_wd ezlabgva/busco:v4.1.0_cv1 busco -h`

ERROR User with uid '1000', cannot write the working directory. Please be sure you mounted a volume and fix the permissions or provide an alternative user using 'docker run -u uid'. See the documentation.
No permission to write in the current directory.

但是可以看见此行
Run BUSCO as follows:
使用它提示的方式获得帮助。

@animalia:~$ sudo docker run -u $(id -u) -v $(pwd):/busco_wd ezlabgva/busco:v4.1.0_cv1 busco -h
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

Welcome to BUSCO 4.1.0: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide.

optional arguments:
  -i FASTA FILE, --in FASTA FILE
                        Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.
  -c N, --cpu N         Specify the number (N=integer) of threads/cores to use.
  -o OUTPUT, --out OUTPUT
                        Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path
  --out_path OUTPUT_PATH
                        Optional location for results folder, excluding results folder name. Default is current working directory.
  -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
  -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.
                        There are three valid modes:
                        - geno or genome, for genome assemblies (DNA)
                        - tran or transcriptome, for transcriptome assemblies (DNA)
                        - prot or proteins, for annotated gene sets (protein)
  -l LINEAGE, --lineage_dataset LINEAGE
                        Specify the name of the BUSCO lineage to be used.
  -f, --force           Force rewriting of existing files. Must be used when output files with the provided name already exist.
  -r, --restart         Continue a run that had already partially completed.
  --limit REGION_LIMIT  How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
  --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms
  -q, --quiet           Disable the info logs, displays only errors
  --augustus_parameters AUGUSTUS_PARAMETERS
                        Pass additional arguments to Augustus. All arguments should be contained within a single pair of quotation marks, separated by commas. E.g. '--param1=1,--param2=2'
  --augustus_species AUGUSTUS_SPECIES
                        Specify a species for Augustus training.
  --auto-lineage        Run auto-lineage to find optimum lineage path
  --auto-lineage-prok   Run auto-lineage just on non-eukaryote trees to find optimum lineage path
  --auto-lineage-euk    Run auto-placement just on eukaryote tree to find optimum lineage path
  --update-data         Download and replace with last versions all lineages datasets and files necessary to their automated selection
  --offline             To indicate that BUSCO cannot attempt to download files
  --config CONFIG_FILE  Provide a config file
  -v, --version         Show this version and exit
  -h, --help            Show this help message and exit
  --list-datasets       Print the list of available BUSCO datasets

尝试以后获得帮助文档

2.参数设置

usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

可以看见必须要输入-i -l -o -m
-i也就是我们组装出来的fasta
-o是输出文件位置，经过尝试这个文件是不需要我们提前创建的。busco会自己创建此文件，我们只需要准备名字即可。
注意-i-o都不能使绝对路径，只能写一个名字。
我觉得可能是因为Dcoker是通过-v命令映射卷轴所以不能识别我们在linux里的绝对路径。
ps：这里解释一下这个命令-v $(pwd):/busco_wd
之前busco提示/busco_wd是busco在docker里面的工作目录，pwd则是当前路径。则这个命令将当前目录与docker里/busco_wd映射连接，这样我们输入的路径如果带着绝对路径会出错，也要求我们的输出输入文件都得在当前目录内容才行。

-l是list-datasets库，我们可以使用如下命令，他会显示busco现有的库，我们可以选择自己对应类群库，busco会自己下载。我选择了insecta_odb10

sudo docker run -u $(id -u) -v $(pwd):/busco_wd ezlabgva/busco:v4.1.0_cv1 busco --list-datasets

图片.png

-m直接看之前的帮助文档即可。

3.运行busco

参考命令

docker run -u $(id -u) -v $(pwd):/busco_wd ezlabgva/busco:v4.1.0_cv1 busco -i $fasta -l  insecta_odb10 -o $outdir -m genome

docker run -u $(id -u) -v $(pwd):/busco_wd ezlabgva/busco:v4.1.0_cv1 busco -i $fasta -o $outdir -m genome --auto-lineage
#我看见有的帖子说可以自动寻找list-datasets但是我没有尝试

我的使用参考

fasta=SPHD-01.fasta
outdir=SPHD_busco

docker run -u $(id -u) -v $(pwd):/busco_wd ezlabgva/busco:v4.1.0_cv1 busco -i $fasta -l insecta_odb10 -o $outdir -m genome