非度量多维尺度分析

2020-05-06  本文已影响0人  吴十三和小可爱的札记

非度量多维尺度法(NMDS)是一种将多维空间的研究对象(样本或变量)简化到低维空间进行定位、分析和归类,同时又保留对象间原始关系的数据分析方法。

适用于无法获得研究对象间精确的相似性或相异性数据,仅能得到他们之间等级关系数据的情形。换句话说,当资料不适合直接进行变量型多维尺度分析时,对其进行变量变换,再采用变量型多维尺度分析。其特点是根据样品中包含的物种信息,以点的形式反映在多维空间上,而对不同样品间的差异程度,则是通过点与点间的距离体现的,最终获得样品的空间定位点图。

利用OTU表进行NMDS:利用OTU表做NMDS时,可以获得(样本+物种)两种得分信息,能够找到更多的潜在信息。


library(vegan)
library(ggplot2)

# data --------------------------------------------------------------------

set.seed(13)
otu <- matrix(sample(c(0:200), 1200, replace = TRUE),
              ncol = 12, nrow = 100,
              dimnames = list(
                row_names = paste0("OTU", seq(1:100)),
                col_names = paste0("sample", seq(1:12))
              ))

groups <- data.frame(
  sample = paste0("sample", seq(1:12)),
  sites = rep(c("root", "soil", "rhizosphere"), 4)
)
# hellinger-transform: square root of method = "total"
otu <- t(otu) 
hell_otu <- decostand(otu, "hell") 

# The number of points n should be n > 2*k + 1
# default k = 2
NMDS <- metaMDS(hell_otu, k = 4, distance = "bray")

# print NMDS
# stress 应该越小越好,通常阈值为0.2
NMDS

# Get Species or Site Scores from an Ordination
score_species <- scores(NMDS,  display = "species")
score_sites <- scores(NMDS,  display = "sites")

# get the stress value
stress <- round(NMDS$stress, 4)

# adds group date
# 有时groups中的sample 和score 的结果顺序不一样
# 推荐使用merge 或者 match 函数合并数据

plot_data <- cbind(as.data.frame(score_sites), groups)


# set data for spider plot ------------------------------------------------

# calculate the center of NMDS1, NMDS2, according to groups
cent <- aggregate(cbind(NMDS1, NMDS2) ~ sites, 
                  data = plot_data, FUN = mean)

# summarise_if 有利于自动化 
plot_data %>% 
  group_by(sites) %>% 
  summarise_if(is.numeric, mean)

cent <- setNames(cent, c("sites", "oNMDS1", "oNMDS2"))
spider_data <- merge(plot_data, cent, by = "sites", sort = FALSE)

# 设置坐标轴刻度label
theme <- function(){
  list(scale_x_continuous(breaks = seq(from = -0.1, 
                                       to = 0.1, 
                                       by = 0.05), 
                          labels = seq(from = -0.1, 
                                       to = 0.1, 
                                       by = 0.05)),
       scale_y_continuous(breaks = seq(from = -0.1, 
                                       to = 0.1, 
                                       by = 0.05), 
                          labels = seq(from = -0.1, 
                                       to = 0.1, 
                                       by = 0.05)))
}

# 可视化
ggplot(plot_data, aes(x = NMDS1, y = NMDS2, 
                      color = sites)) + 
  geom_point() + 
  coord_fixed() + 
  stat_ellipse()

ggplot(spider_data, aes(x = NMDS1, y = NMDS2, 
                        color = sites)) + 
  geom_segment(aes(xend = oNMDS1, yend = oNMDS2)) +
  geom_point() +
  geom_point()  

NMDS结果评估

通常情况下我们可以根据应力值来判断排序模型的合理性,应力值(Stress)最好不要大于0.2。
此外,还可以通过goodness()和stressplot() {vegan}来评估NMDS拟合优度。

# Shepard图: 若R2越大,则NMDS结果越合理
stressplot(NMDS, main = "Shepard Plot")
gof <- goodness(NMDS1)
plot(NMDS, display = "sites", type = "t", main = "goodness of fit statistic")
points(NMDS, cex = gof * 200, col = "red")
上一篇 下一篇

猜你喜欢

热点阅读