基因组学

Roary分析泛基因组

2021-10-07  本文已影响0人  胡童远

文章:Roary: rapid large-scale prokaryote pan genome analysis. bioinformatics 2015
引用:1780
GITHUB: http://sanger-pathogens.github.io/Roary/
Tutorial:https://github.com/microgenomics/tutorials/blob/master/pangenome.md

conda安装

conda create -n pantools
conda activate pantools
conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install roary
roary -h

运行roary

roary -e -n -v -i 80 -p 4 \
-f ./result_roary/ \
./prokka_gff/*.gff

参数:
f: output directory
e: create a multiFASTA alignment of core genes using PRANK
n: fast core gene alignment with MAFFT, use with -e
p: number of threads [1]
v: verbose output to STDOUT
r: create R plots, requires R and ggplot2
i: minimum percentage identity for blastp [95]

运行过程

Fixing input GFF files
Extracting proteins from GFF files
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
Cluster with MCL
Running command: pan_genome_post_analysis
Running command: FastTree -fastest -nt acces sory_binary_genes.fa > accessory_binary_genes.fa.newick
Running command: protein_alignment_from_nucleotides  -v  --mafft pan_genome_sequences/lexA.fa
Running command: mafft --auto --quiet pan_genome_sequences/group_537.fa > pan_genome_sequences/group_537.fa.aln

roary结果

gene_presence_absence.Rtab文件即是PAV

core/soft/shell/cloud genes

number_of_genes_in_pan_genome.Rtab (10种pan组合)

number_of_conserved_genes.Rtab (10种core组合)

上一篇下一篇

猜你喜欢

热点阅读