16s rRNA微生物

4. 优势物种展示

2021-02-15  本文已影响0人  吴十三和小可爱的札记

简介

优势物种展示。

数据样式:常见丰度表,列为样本,行为物种,交叉区域为各物种在各样本中的丰度。

一般堆叠柱状图

一般选取top10—top15丰度的细菌门类群用于展示其丰度组成,其他的类群合并为“Others”展示;横坐标代表不同测序样本(或分组),纵坐标代表了微生物类群的相对丰度,不同的微生物类群用不同颜色展示。

library(tidyverse)
# input OTUs 
otu_table  <-  read.delim("16S-amplicon-analysis/otutab.txt", header=T, sep="\t", row.names=1, stringsAsFactors = FALSE)

metadata <- read.delim("16S-amplicon-analysis/metadata.txt", header=T, sep="\t", stringsAsFactors = FALSE)

taxonomy <- read.delim("16S-amplicon-analysis/clean_taxonomy.txt", header=T, sep="\t", stringsAsFactors = FALSE)


otu_table$OTUs <- rownames(otu_table)
genus <- subset(taxonomy, select = c(OTUs, genus))
  
temp <- merge(otu_table, genus, by = "OTUs", )
temp <- subset(temp, select = -OTUs)

temp %>% 
  group_by(genus) %>% 
  summarise_all(sum) -> genus_table

genus_table <- as.data.frame(genus_table)
genus_table <- na.omit(genus_table)
rownames(genus_table) <- genus_table$genus
genus_table <- subset(genus_table, 
                      select = -c(genus))
# rowsums and sorted
genus_table$rowsum <- rowSums(genus_table)
genus_table <- genus_table[order(genus_table$rowsum, 
                                 decreasing = TRUE), ]
# top 10
genus_top10 <- genus_table[1:10, ]
genus_top10['Others', ] <- colSums(genus_table) - colSums(genus_top10)


# # 若Others已经在top10内 
# genus_top10["rest", ] <- colSums(genus_table) - colSums(genus_top10)
# genus_top10["The_others", ] <- genus_top10["Others", ] + genus_top10["rest", ]

# check 
check_top <- function(top_table, otu_table){
  judge <- colSums(genus_top10) == colSums(otu_table)
  if (sum(judge) == length(otu_table)){
    print("Everything looks good")}
  else{print("something wrong.")}
}

check_top(genus_top10, genus_table)

## [1] "Everything looks good"

# make factor levels 
genus_top10$Taxo <- fct_inorder(rownames(genus_top10))


temp <- pivot_longer(data = genus_top10, 
                     cols = -c(Taxo, rowsum),
                     names_to = "Sample", 
                     values_to = "value")

plot_data <- merge(temp, metadata, by = "Sample")

ggplot(data = plot_data , aes(x = Sample, y = value, 
                              fill = Taxo)) + 
  geom_col(position = 'fill', width = 0.6) +
  facet_wrap(~ Group, scales = 'free_x', ncol = 3) +
  scale_fill_brewer(palette = "Set3") +
  labs(x = '', y = 'Relative Abundance(%)') +
  theme(panel.grid.minor.y = element_line(colour = "black"),
        panel.background = element_rect(color = 'black', 
                                        fill = 'transparent'))
plot.png

堆叠图连线

  1. 根据各分组数据获得起点终点坐标,利用geom_segment()做连线。

  2. ggalluvial

img

物种度聚类热图

根据所有样本在属水平的物种注释及丰度信息,选取丰度排名前35的属,从物种和样本两个层面进行聚类并绘制成热图,便于发现哪些物种在哪些样本中聚集较多或含量较低。

library(pheatmap)
# top 30
genus_top30 <- genus_table[1:30, ]
# genus_top30['Others', ] <- colSums(genus_table) - colSums(genus_top30)

plot_mat <- subset(genus_top30, select = -rowsum)
annot_col <- data.frame(row.names = colnames(plot_mat), 
                         group = metadata$Group)

pheatmap(mat = plot_mat, annotation_col = annot_col)
image.png
上一篇 下一篇

猜你喜欢

热点阅读