R进行两因素重复测量方差分析并可视化（双组折线图）

2022-11-19 本文已影响0人欧阳松

在仙桃学术上的生信工具里面，有一个折线图的绘图工具，可以很快速便捷的得出结论并可视化结果，当然不是说这个功能有多强大，而是统计学方法非常专业。
比如用它自带的数据https://bioinfomatics.xiantao.love/biotools/data/demo/free/linePlot/%E6%8A%98%E7%BA%BF%E5%9B%BE.xlsx
通过无脑式的鼠标点击，可得到下面一系列的结果。

折线图
可以看到有很多结果，包括各种统计学描述、各种格式的图片，以及一个demo.R

比如统计描述

组别1	组别2	数目	最小值	最大值	中位数(Median)	四分位距(IQR)	下四分位	上四分位	均值(Mean)	标准差(SD)	标准误(SE)
Time1	Control	12	2.833	3.306	2.967	0.254	2.885	3.139	3.020	0.154	0.044
Time1	Intervene	12	2.624	3.418	3.074	0.297	2.876	3.173	3.032	0.258	0.074
Time2	Control	12	2.733	3.240	2.930	0.187	2.861	3.047	2.957	0.148	0.043
Time2	Intervene	12	3.419	4.318	4.082	0.388	3.824	4.212	4.005	0.285	0.082
Time3	Control	12	2.611	3.414	2.938	0.392	2.805	3.197	2.979	0.246	0.071
Time3	Intervene	12	5.799	6.285	6.065	0.185	5.916	6.101	6.030	0.139	0.040

又比如异常值分析、正态性检验(Shapiro-Wilk normality test)、方差齐性检验(Levene's test)、协方差同质假设(Homogeneity of covariances assumption/Box’s M-test)、球形假设(Mauchly's Test for Sphericity)、两因素单向重复测量数据方差分析（Two-way mixed ANOVA）、单独效应分析(Simple effect)、多重成对比较(Pairwise Comparisons of Estimated Marginal Means)等等专业术语，并且还给了详细的统计学结果。

image.png

这种统计学结果让大家很欣慰，那么这些结果都是如何计算出来的呢？？？

我们可以点开demo.R这个结果，读一读代码
https://bioinfomatics.xiantao.love/biotools/code/open/lineplot.R

现在我们复现一下：

加载需要的包并导入数据

library(ggplot2)
library(reshape2)
library(car)
library(rstatix)
## 导入数据，比如我们把下载的折线图数据保存在桌面
library(readxl)
data <- read_excel("~/Desktop/折线图.xlsx") ## mac系统代码
data # 显示结果

trt	Time1	Time2	Time3
Control	3.185968	2.875802	3.414224
Control	2.832632	2.862111	2.801333
Control	3.123533	2.984142	2.909648
Control	2.880927	3.146924	3.256431
Control	3.090936	2.732620	2.966887
Control	2.920949	2.865287	2.820161
Control	3.306013	2.856390	2.806807
Control	2.885842	3.002480	2.611145
Control	2.955919	3.239596	3.182478
Control	2.978572	2.818567	2.734901
Control	3.190647	3.042065	3.001664
Control	2.883867	3.063226	3.240626
Intervene	2.623973	3.771982	6.041055
Intervene	2.663926	3.841685	6.098856
Intervene	3.045286	4.301342	6.108293
Intervene	3.054712	4.065593	6.085687
Intervene	3.127857	4.097633	6.055522
Intervene	3.147686	4.134947	5.926182
Intervene	3.248095	3.419161	5.855298
Intervene	3.326086	4.182629	5.885426
Intervene	3.093890	4.318014	6.151859
Intervene	3.418347	4.303615	6.284906
Intervene	2.935396	3.986879	5.798966
Intervene	2.697790	3.640084	6.073529

新增一列id，id即为数字，用于后续分析，必不可少

data$id <- 1:nrow(data)

trt	Time1	Time2	Time3	id
Control	3.185968	2.875802	3.414224	1
Control	2.832632	2.862111	2.801333	2
Control	3.123533	2.984142	2.909648	3
Control	2.880927	3.146924	3.256431	4
Control	3.090936	2.732620	2.966887	5
Control	2.920949	2.865287	2.820161	6
Control	3.306013	2.856390	2.806807	7
Control	2.885842	3.002480	2.611145	8
Control	2.955919	3.239596	3.182478	9
Control	2.978572	2.818567	2.734901	10
Control	3.190647	3.042065	3.001664	11
Control	2.883867	3.063226	3.240626	12
Intervene	2.623973	3.771982	6.041055	13
Intervene	2.663926	3.841685	6.098856	14
Intervene	3.045286	4.301342	6.108293	15
Intervene	3.054712	4.065593	6.085687	16
Intervene	3.127857	4.097633	6.055522	17
Intervene	3.147686	4.134947	5.926182	18
Intervene	3.248095	3.419161	5.855298	19
Intervene	3.326086	4.182629	5.885426	20
Intervene	3.093890	4.318014	6.151859	21
Intervene	3.418347	4.303615	6.284906	22
Intervene	2.935396	3.986879	5.798966	23
Intervene	2.697790	3.640084	6.073529	24

将短数据转换为长数据

data2 <- gather(data, key = "x", value = "value", -trt, -id)

对各组进行统计学描述

data3 <- data2 %>% 
  group_by(trt, x) %>% 
  get_summary_stats(value)

trt	x	variable	n	min	max	median	q1	q3	iqr	mad	mean	sd	se	ci
Control	Time1	value	12	2.833	3.306	2.967	2.885	3.139	0.254	0.156	3.020	0.154	0.044	0.098
Control	Time2	value	12	2.733	3.240	2.930	2.861	3.047	0.187	0.137	2.957	0.148	0.043	0.094
Control	Time3	value	12	2.611	3.414	2.938	2.805	3.197	0.392	0.252	2.979	0.246	0.071	0.156
Intervene	Time1	value	12	2.624	3.418	3.074	2.876	3.173	0.297	0.232	3.032	0.258	0.074	0.164
Intervene	Time2	value	12	3.419	4.318	4.082	3.824	4.212	0.388	0.327	4.005	0.285	0.082	0.181
Intervene	Time3	value	12	5.799	6.285	6.065	5.916	6.101	0.185	0.097	6.030	0.139	0.040	0.088

批量运行正态性检验（Shapiro-Wilk normality test）

for(i in unique(data[,1])){
  data1 <- data[data[,1] == i,]
  print(lapply(data1[,-1], function(x) shapiro.test(x)))
}

$Time1

Shapiro-Wilk normality test

data: x

W = 0.91913, p-value = 0.2788

$Time2

Shapiro-Wilk normality test

data: x

W = 0.88011, p-value = 0.08792

$Time3

Shapiro-Wilk normality test

data: x

W = 0.73897, p-value = 0.002055

$id

Shapiro-Wilk normality test

data: x

W = 0.95933, p-value = 0.7742

批量运行方差齐性检验（Levene's Test）

for(i in unique(data2$x)){
  data1 <- data2[data2$x == i,]
  print(leveneTest(value~trt, data = data1))
}

Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.5617 0.2245
22
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 2.5864 0.122
22
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 3.8375 0.06291 .
22

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

两因素重复测量方差分析

anova_test(data = data2, dv = value, wid = id, within = x, between = trt)

ANOVA Table (type II tests)

$ANOVA
Effect DFn DFd F p p<.05 ges
1 trt 1 22 510.657 1.02e-16 * 0.918
2 x 2 44 391.826 9.19e-29 * 0.902
3 trt:x 2 44 407.706 4.01e-29 * 0.905

$Mauchly's Test for Sphericity
Effect W p p<.05
1 x 0.966 0.699
2 trt:x 0.966 0.699

$Sphericity Corrections
Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] p[HF]<.05
1 x 0.968 1.94, 42.57 6.65e-28 * 1.059 2.12, 46.61 9.19e-29 *
2 trt:x 0.968 1.94, 42.57 2.98e-28 * 1.059 2.12, 46.61 4.01e-29 *

可视化结果

ggplot(data = data3, aes(x = x, y = mean, color = trt, group = trt)) +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), color = "black", width = 0.2) +
  geom_line() +
  geom_point() +
  theme_bw()

image.png

当然这个图跟原图不一样，并且没有两两比较，我们可以使用ggpubr和rstatix进行二次绘图。

关于rstatix进行统计学分析，其实可以Repeated Measures ANOVA in R: The Ultimate Guide - Datanovia
得到答案，具体的分析，我们暂不说了，只说如何进行统计学分析，并且自动两两比较

res.aov <- anova_test(data = data2, dv = value, wid = id, within = x, between = trt) 
# 组间两两比较
pwc <- data2 %>%
    group_by(x) %>%
    pairwise_t_test(
        value ~ trt, paired = TRUE,
        p.adjust.method = "bonferroni" ## 校正方法，包括 "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".等
    )
pwc

x	.y.	group1	group2	n1	n2	statistic	df	p	p.adj	p.adj.signif
Time1	value	Control	Intervene	12	12	-0.1453235	11	8.87e-01	8.87e-01	ns
Time2	value	Control	Intervene	12	12	-12.3908123	11	1.00e-07	1.00e-07	****
Time3	value	Control	Intervene	12	12	-40.9352857	11	0.00e+00	0.00e+00	****

可以看到已经自动进行了两两比较，我们使用rstatix包的add_xy_position()函数可以添加两两比较列表和x轴y轴位置。

 pwc <- pwc %>% add_xy_position(x = "x")# 按时间分列
pwc

x	.y.	group1	group2	n1	n2	statistic	df	p	p.adj	p.adj.signif	y.position	groups	xmin	xmax
1	value	Control	Intervene	12	12	-0.1453235	11	8.87e-01	8.87e-01	ns	3.7834	Control , Intervene	0.8	1.2
2	value	Control	Intervene	12	12	-12.3908123	11	1.00e-07	1.00e-07	****	4.6834	Control , Intervene	1.8	2.2
3	value	Control	Intervene	12	12	-40.9352857	11	0.00e+00	0.00e+00	****	6.6504	Control , Intervene	2.8	3.2

可视化绘图

library(ggpubr)
ggline(data2,x='x',y='value',
color = 'trt', #按组配色
add = 'mean_sd', #添加均数标准差，也可以设置均数标准误，CI等。
palette = "aaas" # aaas杂志配色
)+stat_pvalue_manual(pwc, tip.length = 0, hide.ns = TRUE) + # 添加两两比较，隐藏无意义
    labs(
        subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
        caption = get_pwc_label(pwc) # 右下角条件两两比较方法。
    )

image.png

当然，我们还可以继续修改图片，比如全部显示结果，取消显著性标记的下划线，或者显示具体的p值等

ggline(data2,x='x',y='value',
       color = 'trt', #按组配色
       add = 'mean_sd', #添加均数标准差，也可以设置均数标准误，CI等。
       palette = "aaas",# aaas杂志配色
ggtheme = theme_bw(), #背景
xlab = F,ylab = "Score", #xy轴名称
legend.title="Treatment" ,legend="top", #标签名称和位置
title = "Comparison of two groups at different times" #标题
)+stat_pvalue_manual(pwc, bracket.size = 0) + # 添加两两比较，显示所有结果
    labs(
        subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
        caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
    )

image.png

也可以显示数值

ggline(data2,x='x',y='value',
       color = 'trt', #按组配色
       add = 'mean_sd', #添加均数标准差，也可以设置均数标准误，CI等。
       palette = "aaas",# aaas杂志配色#背景
       xlab = F,ylab = "Score", #xy轴名称
       legend.title="Treatment" ,legend="right",
       title = "Comparison of two groups at different times" #标题
)+stat_pvalue_manual(pwc, bracket.size = 0,label = "p.adj") + # 添加两两比较，显示所有结果
    labs(
        subtitle = get_test_label(res.aov, detailed = TRUE), # 添加整体差异
        caption = get_pwc_label(pwc) # 右下角显示事后两两比较方法。
    )

image.png

R进行两因素重复测量方差分析并可视化（双组折线图）

加载需要的包并导入数据

新增一列id，id即为数字，用于后续分析，必不可少

将短数据转换为长数据

对各组进行统计学描述

批量运行正态性检验（Shapiro-Wilk normality test）

批量运行方差齐性检验（Levene's Test）

两因素重复测量方差分析

可视化结果

image.png

可视化绘图

猜你喜欢

热点阅读