R使用笔记：ggplot2 & ggpubr 给boxp

2018-07-12 本文已影响32人 GPZ_Lab

本次笔记内容：

用ggpubr的stat_compare_means()给boxplot加上significant labels

编写函数选择P值小于某一值的pairs，并只在图上标注这些pairs

多图使用ggpubr的ggarrange()和annotate_figure()整理，注意aes_string()的使用

把一个List的plots: list(plot1, plot2, plot3...)拼在一起

对手头的数据，你需要对每个变量按照固定分组画一个boxplot，并加上Kruskal-Wallis多组检验结果及post-hoc两两比较结果，其中两两比较的结果只需要有显著性的。以此画多个图后将他们“拼图”成一个。

以iris数据集为例，根据以下代码用ggplot2绘制了"Sepal.Width"变量的基本箱线图，并用ggpubr包的stat_compare_means()加上了significant labels。这里是加上了所有组合的labels。combn()用于获取所有不重复的两两组合pairs。注意comparisons = 这里需要传递一个list(), list中包含了以c()储存的pairs，你也可以自己指定pairs是什么，比方说comparisons = list(c(a,b), c(b,c))

p <- ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
  geom_boxplot(position = position_dodge(0.8)) +
  geom_point(position = position_jitterdodge()) +
  scale_fill_aaas() +
  theme_classic(base_size = 16)+
  labs(x = "", y = 'Sepal.Width') +
  stat_compare_means() +
  stat_compare_means(comparisons = combn(levels(iris$Species), 2, simplify =FALSE))

在iris数据集中组间都很显著，但是如果是很多组在Kruskal-Wallis(或者ANOVA)检验后的两两比较(post-hoc)，可能会出现boxplot上有一堆label, 图被压缩的很丑。而且主要是想把显著的label标注上去。

首先stat_compare_means()是通过pairwise.wilcox.test做post-hoc两两比较的：

> test <- pairwise.wilcox.test(iris$Sepal.Width, iris$Species, p.adjust.method = "none")
> test

    Pairwise comparisons using Wilcoxon rank sum test 

data:  iris$Sepal.Width and iris$Species 

           setosa  versicolor
versicolor 2.1e-13 -         
virginica  7.1e-09 0.0046    

P value adjustment method: none

于是可以写个函数选择p < "你想要一个cutoff值" 的pairs, 为了防止因为没有p值小于你想要的cutoff值而报错，在没有的时候返回一个空list()。

which_pair_to_use <- function(data, variable, group, cutoff = 0.1, p.adjust_method = "none") {
  pair_p <- pairwise.wilcox.test(data[[variable]], data[[group]], p.adjust.method = p.adjust_method)
  ind <- as.matrix(which(pair_p$p.value < cutoff, arr.ind = TRUE))
  
  if (nrow(ind) > 0) {
    compairs <- list()
    for (i in 1:nrow(ind)){
      pair <- c(rownames(pair_p$p.value)[ind[i,"row"]], colnames(pair_p$p.value)[ind[i,"col"]])
      compairs[[i]] <- pair 
    }
    return(compairs) 
    
  } else {
    return(list())}
  
}

注意由于是把变量的名称字符串传递给上述函数，所以参数需要用引号括起来，cutoff =一般来说是0.05，这里由于p值都太小了所以我用一个比较小的值来举例。为了可以画多个图方便，可以简单写成一个函数。注意在把ggplot2写进函数的时候，给aes()传参数会出现问题。因为aes()中并不是引号括起来的字符串。这时需要使用aes_string()

fun_to_plot <- function(data, group, variable) {
  p <- ggplot(data, aes_string(x = group, y = variable, fill = group)) +
  geom_boxplot(position = position_dodge(0.8)) +
  geom_point(position = position_jitterdodge()) +
  scale_fill_aaas() +
  theme_classic(base_size = 16)+
  labs(x = "", y = variable) +
  stat_compare_means() +
  stat_compare_means(comparisons = which_pair_to_use(data,
                                                     variable,
                                                     group,
                                                     cutoff = 0.0005))
  return(p)
}
fun_to_plot(iris, "Species", "Sepal.Width")
#得到下图

接下来把Iris中Sepal的width和length画在同一个图里，并且共用同一个legend。并且给合并的这个图加上大title。用ggpubr的ggarrange()来组合两个图，用annotate_figure()来给组合好的图加以修饰。

annotate_figure(
  ggarrange(fun_to_plot(iris,"Species", "Sepal.Width"),
            fun_to_plot(iris,"Species","Sepal.Length"),
            common.legend = TRUE, legend = "right"),
  top = text_grob("Sepal_species", size = 18)
  )

另外如果你用函数画出了lists of plots: 即 plots_list = list(plot1, plot2, plot3...)这样的形式。可以用grid.arrange(grobs = plots_list, ncol = 7)把他们拼在一起。ncol = 7即一行放7个图。

除了使用ggpubr来给图添加显著label以外，还有ggsignif包。（可以了解一下。我如果用到了会在这里加上）

参考链接：
annotate_figure() , ggarrange()
ggplot2传参数：aes_string()
arrange a list of plot
ggsignif参考链接

R使用笔记：ggplot2 & ggpubr 给boxp

猜你喜欢

热点阅读