Roary分析泛基因组
2021-10-07 本文已影响0人
胡童远
文章:Roary: rapid large-scale prokaryote pan genome analysis. bioinformatics 2015
引用:1780
GITHUB: http://sanger-pathogens.github.io/Roary/
Tutorial:https://github.com/microgenomics/tutorials/blob/master/pangenome.md
conda安装
conda create -n pantools
conda activate pantools
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install roary
roary -h
运行roary
roary -e -n -v -i 80 -p 4 \
-f ./result_roary/ \
./prokka_gff/*.gff
参数:
f: output directory
e: create a multiFASTA alignment of core genes using PRANK
n: fast core gene alignment with MAFFT, use with -e
p: number of threads [1]
v: verbose output to STDOUT
r: create R plots, requires R and ggplot2
i: minimum percentage identity for blastp [95]
运行过程
Fixing input GFF files
Extracting proteins from GFF files
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
Cluster with MCL
Running command: pan_genome_post_analysis
Running command: FastTree -fastest -nt acces sory_binary_genes.fa > accessory_binary_genes.fa.newick
Running command: protein_alignment_from_nucleotides -v --mafft pan_genome_sequences/lexA.fa
Running command: mafft --auto --quiet pan_genome_sequences/group_537.fa > pan_genome_sequences/group_537.fa.aln
roary结果
gene_presence_absence.Rtab文件即是PAV
core/soft/shell/cloud genes
number_of_genes_in_pan_genome.Rtab (10种pan组合)
number_of_conserved_genes.Rtab (10种core组合)