R plotR语言做图

R语言学习指南(8) 绘制具有显著性条形图

2020-12-30  本文已影响0人  R语言数据分析指南

整理数据

使用ggplot2软件包中的测试数据集msleep,其中包括哺乳动物的睡眠时间。让我们首先加载包并查看数据:

library(tidyverse)
glimpse(msleep, width = 50)

由于我们对不同动物的睡眠时间感兴趣,因此我们感兴趣的变量为voresleep_total首先,我们需要重构数据,以便获得这些动物的平均睡眠时间和睡眠时间的标准差:
注:此处涉及的重要函数请通过dplyr官网检索

msleep %>% 
  group_by(vore) %>% summarise(mean_sleep = mean(sleep_total),
    sd_sleep   = sd(sleep_total)) 
# A tibble: 5 x 3
  vore    mean_sleep sd_sleep
  <chr>        <dbl>    <dbl>
1 carni        10.4      4.67
2 herbi         9.51     4.88
3 insecti      14.9      5.92
4 omni         10.9      2.95
5 NA           10.2      3.00

为了获得这些数据,我们首先通过group_by函数根据分组拆分数据,然后summarise函数进行统计计算

但是数据还不够完善。我们在数据中含有缺失值,并且动物名被缩写,
通过以下操作解决问题:

(sleep_mean_values <- msleep %>% group_by(vore) %>% 
summarise(mean_sleep = mean(sleep_total),
sd_sleep   = sd(sleep_total)) %>% drop_na() %>% 
  mutate(vore = case_when(vore == "insecti" ~ "insectivore",
      vore == "omni" ~ "omnivore",vore == "carni" ~ "carnivore",
      vore == "herbi" ~ "herbivore") %>% as.factor %>% 
      fct_relevel("insectivore", "omnivore","carnivore", "herbivore")))
# A tibble: 4 x 3
  vore        mean_sleep sd_sleep
  <fct>            <dbl>    <dbl>
1 carnivore        10.4      4.67
2 herbivore         9.51     4.88
3 insectivore      14.9      5.92
4 omnivore         10.9      2.95

首先通过case_when函数更改变量的名称,并将变量vore转换为因子。并使用函数fct_relevel自定义条形图顺序,确保食虫动物首先出现,然后是杂食动物,然后是食肉动物,最后是食草动物。我们选择此顺序是因为它对应于动物的平均值。最后,使用函数drop_na剔除NA值

创建第一个条形图

sleep_mean_values %>% 
  ggplot(aes(vore, mean_sleep)) +
    geom_col(aes(fill = vore), color = "black", width = 0.85) +
    geom_errorbar(aes(ymin = mean_sleep - sd_sleep,
    ymax = mean_sleep + sd_sleep),
    color = "#22292F",width = .1)

可视化是很好的第一步,但是仍然存在一些问题。例如,图例是多余的。我们已经知道红色条形图表示食虫动物。同样,x和y值实际上没有意义。我们需要重命名它们并使它们更有意义。而且,我们没有标题或标题告诉我们的读者误差棒的含义。另一个问题是条形图的颜色可能会产生误导。为什么食虫动物应该是红色的?让我们在下一个可视化中对这些进行修改

sleep_mean_values %>% 
  ggplot(aes(vore, mean_sleep)) +
  geom_col(aes(fill = vore), color = "black", width = 0.85) +
  geom_errorbar(aes(ymin = mean_sleep - sd_sleep,
                    ymax = mean_sleep + sd_sleep),color = "#22292F",width = .1) +
  scale_fill_grey(start = 0.3) +
  scale_y_continuous(limits = c(0, 26), expand = c(0, 0)) +
  guides(fill = FALSE) +theme_minimal() +
  labs(x = "Vore",y = "Mean Sleep",title = "Mean Sleep in Different Animals",
  caption = "Error bars indicate standard deviations")

让我们来看看这些变化

p1 <- sleep_mean_values %>% 
  ggplot(aes(vore, mean_sleep)) +
  geom_col(aes(fill = vore), color = "black", width = 0.85) +
  geom_errorbar(aes(ymin = mean_sleep - sd_sleep,
                    ymax = mean_sleep + sd_sleep),color = "#22292F", width = .1) +
  scale_fill_grey(start = 0.3) +
  scale_y_continuous(limits = c(0, 26), expand = c(0, 0)) +
  guides(fill = FALSE) +theme_minimal() +
  labs(x = "Vore",y = "Mean Sleep",title = "Mean Sleep in Different Animals",
       caption = "Error bars indicate standard deviations") + 
  theme(plot.title = element_text(size = 20,face = "bold",
                                  margin = margin(b = 35)),
        plot.margin = unit(rep(1, 4), "cm"),
        axis.text = element_text(size = 12, color = "#22292F"),
        axis.title = element_text(size = 12, hjust = 1),
        axis.title.x = element_text(margin = margin(t = 12)),
        axis.title.y = element_text(margin = margin(r = 12)),
        axis.text.y = element_text(margin = margin(r = 5)),
        axis.text.x = element_text(margin = margin(t = 5)),
        plot.caption = element_text(size = 12, face = "italic",color = "#606F7B",
                                    margin = margin(t = 12)),
        axis.line = element_line(color = "#3D4852"),
        axis.ticks = element_line(color = "#3D4852"),
        panel.grid.major.y = element_line(color = "#DAE1E7"),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.y = element_blank())
p1

添加轴刻度线和轴线

将p值添加为星号

使用ggpubr软件包,可以将显著性差异的结果自动添加到可视化中。但是,通常情况下需要自定义可视化效果,而使用这些软件包则更加麻烦。因此我们需要直接在ggplot2中进行操作

诀窍是在可视化中添加两行。每行需要四个点。左值较低,左值较高,右值较高,右值较低。因此,我们需要创建一个包含以下值的数据集:

p_value_one <- tibble(
  x = c("insectivore", "insectivore", "omnivore", "omnivore"),
  y = c(22, 23, 23, 22))

该线从食虫动物条中的y == 20开始。然后上升一点至23。然后该线到达杂食动物条并在y == 23处停止。然后下沉一点至22。让我们定义第二条假设线

p_value_two <- tibble(
  x = c("omnivore", "omnivore", "herbivore", "herbivore"),
  y = c(16, 17, 17, 16))

要将这些线添加到绘图中,需要使用geom_line函数在相应的数据集中添加两条线即可。另外需要在图的中间添加星号,使用annotate函数:

p1 + geom_line(data = p_value_one, aes(x = x, y = y, group = 1)) +
  geom_line(data = p_value_two, aes(x = x, y = y, group = 1)) +
  annotate("text", x = 1.5, y = 23.5, label = "***",size = 8, color = "#22292F") +
  annotate("text", x = 3, y = 17.5, label = "*",size = 8, color = "#22292F")

通过geom_line添加了一个新的数据集,另外定义美学group = 1以可视化单个线条

https://mp.weixin.qq.com/s/W8jCZHdrf7TB2M0Khpu7dQ

上一篇下一篇

猜你喜欢

热点阅读