基因组组装真菌基因组

busco

2022-05-18  本文已影响0人  就是大饼

BUSCO——Benchmarking Universal Single-Copy Orthologs 普遍通用的单拷贝直系同源测试,用于评估基因组组装和注释完整性的一个软件。

其流程是:
genoem assemble | tBLASTn --> Augustus --> HMMER3
Transcriptome | Find ORF --> HMMER3
Gene set | HMMER3

下载安装

# 构建conda的python3环境
conda create --name busco-py3.7 python=3.7
#  然后激活
conda activate busco-py3.7
# 执行安装
conda install  busco

使用

说明书如下:

usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
-i FASTA FILE, --in FASTA FILE #序列文件(FASTA格式),可以是组装好的基因组、转录组、蛋白质组
-c N, --cpu N  # 指定线程
-o OUTPUT, --out OUTPUT # 输出文件的名称,不加路径
--out_path OUTPUT_PATH #输出文件的路径(默认当前路径)
-e N, --evalue N  # 为BLAST的E-value cutoff (格式:0.001 or 1e-03;默认 1e-03)
-m MODE, --mode MODE  # geno/genome;tran/transcriptome;prot/proteins
-l LINEAGE, --lineage_dataset LINEAGE # 指定要用的BUSCO lineage(数据库文件夹)
-f, --force  # 存在文件的强制重写。当输出文件名称已存在时使用
-r, --restart  # 继续一个有部分已完成的run
--limit REGION_LIMIT  # 每次BUSCO考虑的候选regions(contig or transcript)数 (默认 3)
--augustus_species AUGUSTUS_SPECIES # 指定一个物种用于Augustus training.
--auto-lineage # 跑auto-lineage找到合适的lineage path
--offline             To indicate that BUSCO cannot attempt to download files
--config CONFIG_FILE  # 提供一个config file
-v, --version   # 查看版本
-h, --help  # 查看帮助信息
--list-datasets  #打印可用的BUSCO datasets

语法:

busco -i test.fa -c 8 -o test -m genome -l eudicots_odb10 > output.txt

得到的结果如:
C:98.1%[S:95.1%,D:3.0%],F:0.6%,M:1.3%,n:2326
2280 Complete BUSCOs (C)
2211 Complete and single-copy BUSCOs (S)
69 Complete and duplicated BUSCOs (D)
14 Fragmented BUSCOs (F)
32 Missing BUSCOs (M)
2326 Total BUSCO groups searched

画图

可以用generate_plot.py 画图(多物种的情况下比较好)
说明书:

usage: python3 generate_plot.py -wd [WORKING_DIRECTORY] [OTHER OPTIONS]

BUSCO plot generation tool.
Place all BUSCO short summary files (short_summary.[generic|specific].dataset.label.txt) in a single folder. It will be your working directory, in which the generated plot files will be written
See also the user guide for additional information

required arguments:
  -wd PATH, --working_directory PATH
                        Define the location of your working directory

optional arguments:
  -rt RUN_TYPE, --run_type RUN_TYPE
                        type of summary to use, `generic` or `specific`
  --no_r                To avoid to run R. It will just create the R script file in the working directory
  -q, --quiet           Disable the info logs, displays only errors
  -h, --help            Show this help message and exit

需要把所有的经过BUSCO检测的结果聚集到一个文件夹之内

mkdir my_summaries
cp run_SPEC1/short_summary_SPEC1.txt my_summaries/.
cp run_SPEC2/short_summary_SPEC2.txt my_summaries/.
cp run_SPEC3/short_summary_SPEC3.txt my_summaries/.
cp run_SPEC4/short_summary_SPEC4.txt my_summaries/.
cp run_SPEC5/short_summary_SPEC5.txt my_summaries/.
python scripts/generate_plot.py –wd my_summaries
上一篇下一篇

猜你喜欢

热点阅读