绘图

ggplot2 分面操作

2019-04-05  本文已影响0人  思考问题的熊

本文主要内容翻译整理自:Easy multi-panel plots in R using facet_wrap() and facet_grid() from ggplot2,部分代码有修改。

ggplot2 一个非常强大的功能就是进行 multi-panel plots 的呈现,也就是我们常说的分面(facet)。通过使用facet_wrap() 或者 facet_grid() 这样的函数我们就可以很方面的将单一的一个图变为多个相关的图。本文将通过一个具体的数据示例帮助你理解 ggplot2 分面的不同方法以及参数。

数据准备集

为了纪念 Captain Marvel 和即将到来的 Avengers: Endgame ,我们将使用来自 Kaggle 的漫威角色数据集

我们将主要用到其中的3个变量信息:

在进行分析之前,首先对数据进行几步清洗,比如去除上述三个变量存在缺失值的数据,对变量进行更简单的重命名,同时因为涉及到的角色太多我们只选择那些出现次数大于100次的角色。

library(ggplot2)
library(dplyr)

marvel <- readr::read_csv("marvel-wikia-data.csv")

marvel <- filter(marvel, SEX != "", ALIGN != "", Year != "") %>% 
  filter(!is.na(APPEARANCES), APPEARANCES>100) %>% 
  mutate(SEX = stringr::str_replace(SEX, "Characters", "")) %>% 
  arrange(desc(APPEARANCES)) %>%
  rename(gender = SEX) %>% 
  rename_all(tolower)

按照年份统计角色出现次数

在整篇文章中,我们将生成按年份分组的演员数来作为整个分析过程的开始,在某些情况下还会生成其他一些分组变量。对于这个初始图我们仅是按年进行简单的计算。

marvel_count <- count(marvel, year)
glimpse(marvel_count)
# glimpse 可以展示数据的观测和变量数量以及每一列的名字和尽可能多的列信息,和structure类似。
## Observations: 57
## Variables: 2
## $ year <dbl> 1939, 1940, 1941, 1943, 1944, 1947, 1948, 1949, 1950, 195...
## $ n    <int> 3, 5, 4, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 7, 20, 36, 34, 21,...

首先画一个由线点构成的单一图形。

ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue",size = 1) +
    geom_point(color="steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by year",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "")

使用 facet_wrap() 按照角色人设分面

首先按照year和alignment来统计数目

marvel_count <- count(marvel, year, align)
glimpse(marvel_count)
## Observations: 114
## Variables: 3
## $ year  <dbl> 1939, 1939, 1940, 1940, 1941, 1941, 1943, 1944, 1947, 19...
## $ align <chr> "Good Characters", "Neutral Characters", "Bad Characters...
## $ n     <int> 2, 1, 1, 4, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 6, 4,...

只需要在上面绘图命令的结尾加上+ facet_wrap(~ align) 就可以绘制按照 alignment 分面的 multi-panel plot

ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue",size = 1) +
    geom_point(color="steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by year",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "") +
    facet_wrap(~ align)

这张图拥有了更大的信息量,比如我们可以发现在1963和1964年出现了大量的坏蛋,随后则逐渐减少;而好人在后面还是一直在稳定的加入。在未特殊指定的情况下,这里 facet_wrap选择了一行展示三个图。

如果对 facet_wrap() 使用两个变量,其实只需要简单的使用 + 来进行链接。但是通常情况下,为了更好的调整布局,建议使用facet_grid()

marvel_count <- count(marvel, year, align, gender)
ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue",size = 1) +
    geom_point(color="steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by year",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "") +
    facet_wrap(~ align + gender)

按照 facet_grid() 指定行列进行绘图

facet_grid(row_variable ~ column_variable) 可以通过指定行和列来进行绘图,例如使用align 作为行变量,gender 作为列变量

ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue",size = 1) +
    geom_point(color="steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by year",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "") +
    facet_grid(align ~ gender)

如果想要排除行或者列变量可以通过.来进行代替。如下所示:

ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue",size = 1) +
    geom_point(color="steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by year",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "") +
    facet_grid(. ~ gender)
ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue",size = 1) +
    geom_point(color="steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by year",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "") +
    facet_grid(align ~ .)

颜色有时效果更好

在时间序列数据中,使用两条不同颜色的线有时比分面效率要更高。

# Limit to male and female and change levels for drawing order
marvel_count <- filter(marvel_count, gender%in%c("Female ", "Male ")) %>% 
    mutate(gender = factor(gender, levels = c("Male ", "Female ")))

ggplot(data = marvel_count, aes(year, n, color = gender)) +
    geom_line(size = 1) +
    geom_point() + 
    theme_classic() +
    labs(title = "New Marvel characters by gender",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "")

颜色和分面混用也不失为一个高效的选择。

ggplot(data = marvel_count, aes(year, n, color = gender)) +
    geom_line(size = 1) +
    geom_point() + 
    theme_classic() +
    labs(title = "New Marvel characters by alignment & gender",
         subtitle = "(limited to characters with more than 100 appearances)",
         y = "Count of new characters", x = "")+ 
    facet_grid(. ~ align) 

几个常用参数

在faceting 函数中,有一些参数是通用的,只是在使用略有差别。

nrow 或者 ncol

ggplot(data = marvel_count, aes(year, n)) +
  geom_line(color = "steelblue",size = 1) +
  geom_point(color = "steelblue") + 
  theme_classic() +
  facet_wrap(~ gender + align, nrow = 2) + 
  labs(title = "New Marvel characters by gender & alignment",
       subtitle = "(using nrow=2)",
       y = "Count of new characters", x = "")
ggplot(data = marvel_count, aes(year, n)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(color ="steelblue") + 
  theme_classic() +
  facet_wrap(~ gender + align, ncol = 6) + 
  labs(title = "New Marvel Characters by gender & alignment",
       subtitle = "(using ncol=6)", 
       y = "Count of new characters", x = "") +
  theme(
       axis.text.x = element_text(angle=50, hjust=1)
  )

margins

marvel_count <- 
    mutate(marvel_count, align = stringr::str_replace(align, "Characters", ""))

ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue", size = 1) +
    geom_point(color = "steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by alignment & gender",
         subtitle = "(margins= TRUE)",
         y = "Count of new characters", x = "") + 
    facet_grid(align ~ gender, margins=TRUE) 

自由定义不一致的Y轴

可以使用scales = "free" 或者 scales = "free_x" 或者 "free_y"进行设置。但是一定要注意这样的图可能会使读者造成误解。

ggplot(marvel_count, aes(year, n)) + 
    geom_line(color = "steelblue", size = 1) + 
    facet_wrap(~gender, scales = "free_y")+
    theme_classic() +
    labs(title = 'with "free" y axes' ,
         y = "Count of new Marvel characters")

space

ggplot(data = marvel_count, aes(year, n)) +
    geom_line(color = "steelblue", size = 1) +
    geom_point(color = "steelblue") + 
    theme_classic() +
    labs(title = "New Marvel characters by alignment & gender",
         subtitle = '(space = "free")',
         y = "Count of new characters", x = "") + 
    facet_grid(align ~ gender, space="free", scales="free") 

strip.position

ggplot(marvel_count, aes(year, n)) + 
  geom_line(color = "steelblue", size = 1) + 
  theme_classic() +
  facet_wrap(~gender, strip.position = "right") + 
  labs(title = 'strip.postition = "right"',
       y = "Count of new Marvel characters")

switch

ggplot(marvel_count, aes(year, n)) + 
    geom_line(color = "Steelblue", size = 1) + 
    theme_classic() +
    facet_grid(~gender, switch = "x"  ) + 
    labs(title = 'switch = "x"',
         y = "Count of new Marvel characters")


加入靠谱熊基地,和大家一起交流 添加我的微信
上一篇下一篇

猜你喜欢

热点阅读