微生物16S/宏基因组/代谢组基因组学比较基因组学

建细菌进化树的策略

2021-09-02  本文已影响0人  胡童远

前面写了16S rDNA进化树。关于细菌基因组水平,下面记录这篇文章用了三种不同的流程reconstruct phylogenetic structure的策略。

标题:Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle
杂志:CELL
时间:2019

策略一(Figure 1)

PhyloPhlAn软件
400 universal PhyloPhlAn markers构建phylogeny
phylophlan参数:--diversity high --accurate --min_num_markers 80

Internal steps:
diamond:
blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
diamond:
blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
mafft --anysymbol 对齐
trimal -gappyout 修剪
RAxML -m PROTCATLG -p 1989 建树

策略二(Figure S3)

PhyloPhlAn软件
400 PhyloPhlAn markers reconstruct phylogeny
phylophlan参数:--diversity high --fast --min_num_markers 80

Internal steps:
diamond:
blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
diamond:
blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
mafft --anysymbol 对齐
trimal -gappyout 修剪
RAxML -m PROTCATLG -p 1989 建树

IQ-TREE -nt AUTO -m LG 建树

策略三(Figure 3)

Roary identified set of cores genes at 95%

roary -e -n -v -p 4 -i 95 \
-f ./result_roary/ \
./out/*.gff

PhyloPhlAn --diversity low --fast
--min_num_markers <50% of the number of core genes identified>
--min_num_entries <90% of the number of input genomes>

--diversity {low,medium,high}
                        Specify the expected diversity of the phylogeny,
                        automatically adjust some parameters: "low": for
                        genus-/species-/strain-level phylogenies; "medium":
                        for class-/order-level phylogenies; "high": for
                        phylum-/tree-of-life size phylogenies (default: None)
--fast                Perform more a faster phylogeny reconstruction by
                        reducing the phylogenetic positions to use; affected
                        parameters depend on the "--diversity" level (default:
                        False)
--min_num_markers MIN_NUM_MARKERS
                        Input genomes or proteomes that map to less than the
                        specified number of markers will be discarded
                        (default: 1)
--min_num_entries MIN_NUM_ENTRIES
                        The minimum number of entries to be present for each
                        of the markers in the database (default: 4)

blastn -outfmt 6 -max_target_seqs 1000000
mafft --anysymbol --auto 对齐
trimal -gappyout 修剪
FastTree -mlacc 2 -slownni -spr 4 -fastest -mlnni 4 -no2nd -gtr -nt 建树
RAxML -p 1989 -m GTRCAT
-t <phylogenetic tree computed by FastTree>

NMDS基于 Roary 遗传距离

The non-metric multidimensional scaling plots were computed on pairwise genetic distances between core gene alignments produced by Roary using the nmds function in the ecodist R package

可视化

The phylogenetic trees were generated using GraPhlAn and the phylogenies were generated using FigTree

还有更多的方法

文章:Insights on the Evolutionary Genomics of the Blautia Genus: Potential New Species and Genetic Content Among Lineages
杂志:Frontiers in Microbiology
时间:2021

策略四

OrthoFinder获取conserved gene families (Orthogroups)
perl retrieve protein sequence
MAFFT (L-INS-i mode)对齐Orthogroups
ModelTest-NG:
Akaike information criterion (AIC)
IQ-TREE 2:
1000 replicates of ultrafast bootstrap
UFBOOT trees by NNI (–bnni)
SH-like approximate likelihood ratio test (–alrt)

策略五

panX core genome to construct single nucleotide polymorphism (SNP)-based tree
cophylo比较phylogenomic aminoacid and the SNP-based trees

TypeMat from the Microbial Genomes Atlas (MiGA) 进行细菌分类剔除anomalous classification

上一篇 下一篇

猜你喜欢

热点阅读