108. Graphics for communication练

2022-04-10 本文已影响0人心惊梦醒

28.2.1练习题

通过labs()函数可以一键设置标题、副标题、字幕（说明性文字）、轴标题和图例标题。我之前比较少用这种方式是图例标题的设置，也就是color参数，具体用的啥忘记了，但学完这个感觉这个最简单。
geom_smooth()有些误导人，因为现在的hwy对于大发动机是向上倾斜的，这是由于包含了配有大引擎的轻量级跑车。用你的建模工具拟合并展示一个更好的模型。
使用你上个月创建的探索性图表，并添加信息丰富的标题，让其他人更容易理解。

解答2：
如问题表述，使用geom_smooth()拟合的曲线在发动机排量>5L时开始向上倾斜，这是由于在相同排量下，轻量级汽车的燃油效率会更高。
小知识：1）汽车的轻量化，就是在保证汽车的强度和安全性能的前提下，尽可能地降低汽车的整备质量，从而提高汽车的动力性，减少燃料消耗，降低排气污染。2）displ是发动机排量，hwy是每加仑高速公路形势里程数。一般来说，发动机排量越大，百公里消耗燃油量越多，hwy越小。

> p <- ggplot(mpg, aes(displ, hwy)) +
        geom_point(aes(color = class)) +
        geom_smooth(se = FALSE) +  # 不在平滑曲线周围展示置信区间
        labs(
            title = "Fuel efficiency generally decreases with engine size",
            subtitle = "Two seaters (sports cars) are an exception because of their light weight",
            caption = "Data from fueleconomy.gov",    
            x = "Engine displacement (L)",
            y = "Highway fuel economy (mpg)",
            colour = "Car type") + 
            geom_point(data=subset(mpg,class=="2seater"),aes(displ,hwy),shape=1,size=3)
> p

燃油经济性和发动机排量的关系

不知道是不是更好，但我尝试了一下在建模一章学到的连续变量和离散变量互作的内容：

p1 <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
#  geom_smooth(se = FALSE) +  # 不在平滑曲线周围展示置信区间
  labs(
    title = "Fuel efficiency generally decreases with engine size",
    subtitle = "Two seaters (sports cars) are an exception because of their light weight",
    caption = "Data from fueleconomy.gov",    
    x = "Engine displacement (L)",
    y = "Highway fuel economy (mpg)",
    colour = "Car type") + 
    geom_point(data=subset(mpg,class=="2seater"),aes(displ,hwy),shape=1,size=3)

# 建模
mod1 <- lm(hwy~displ+class,data=mpg)
grid <- modelr::data_grid(mpg,displ,class)
grid1 <- modelr::add_predictions(grid,mod1)
mod2<-lm(hwy~displ*class,data=mpg)
grid2<- modelr::add_predictions(grid,mod2)
# 残差
residuals <- modelr::gather_residuals(mpg,mod1,mod2)
# 绘图
p2 <- p1 + geom_line(data=grid1,aes(x=displ,y=pred,color=class)) + labs(caption="mod1: hwy~displ+class")
p3 <- p1 + geom_line(data=grid2,aes(x=displ,y=pred,color=class)) + labs(caption="mod2: hwy~displ*class")
p4 <- ggplot(residuals, aes(displ, resid, colour = class)) + 
  geom_point() + 
  facet_grid(model ~ class) 
ggpubr::ggarrange(p1,p2,p3,p4,nrow=2,ncol=2)

原图、mod1、mod2建模结果、残差分布图

28.3.1练习题

用geom_text()将文本（TEXT）放在图片的四个角落。
读annotate()的说明文档，用它在不创建tibble的前提下给plot添加文本标签（text labels）。
geom_text()与分面是如何相互作用的？如何在单个分面中添加一个文本标签？如果在每一个分面中添加不同的文本标签？（提示：考虑下底层数据。）
geom_label()中的什么参数控制背景方框的外观？
arrow()的四个参数是什么？它们如何工作？创建一系列图演示最重要的选项。

解答：

使用geom_text()时，首先必须要准备一个tibble或者dataframe，然后注意用hjust和vjust参数调整一下，不然文本标签会有一部分跑到绘图区域以外。

mtcars$cyl<-as.factor(mtcars$cyl)
(corners <- data.frame(x=c(-Inf,-Inf,Inf,Inf),y=c(Inf,-Inf,Inf,-Inf),label=c("left-top","left-bottom","right-top","right-bottom")))
ggplot(mtcars) + geom_point(aes(x=disp,y=mpg,color=cyl)) + geom_text(data=corners,aes(x=x,y=y,label=label,hjust="inward",vjust="inward"))

用infinite将文本标签放在图片的四个角落

annotate()的说明文档：
annotate()用来创建一个annotation layer，向plot中添加geoms（几何对象），但不像其他典型的geom function，这些几何对象不需要从数据框中的变量中映射，而是作为vectors传入进去。在添加小的注释，例如文本标签是很有用，再也不用向geom_text()那样必须准备一个dataframe或tibble了；如果你的数据是vector形式或者出于某些原因你不想把它们放在数据框里，都可以用annotate()。使用这个函数创建的layer不会影响legend。

annotate(
  geom,
  x = NULL,
  y = NULL,
  xmin = NULL,
  xmax = NULL,
  ymin = NULL,
  ymax = NULL,
  xend = NULL,
  yend = NULL,
  ...,
  na.rm = FALSE
)

geom：集合对象的名称，例如：text、segment、rect等。根据应该有一个ggplot2中所有geoms的汇总的，有时间再补上这个坑。
x,y,xmin,xmax,ymin,ymax,xend,yend：定位美学属性，必须指定其中一个。所有位置美学属性都是scaled，也就是它们会延伸plot原有的limit以使它们自己是可见的，其他美学属性是固定的。
...：其他传递给layer()的参数，通常是美学属性（aesthetic），将它们设置为固定值，例如color="red"；
在这个函数中，对位置的设置一直是我比较头疼的地方，对不同尺度范围的数据可能要设置不用的x和y，为每个数据找一个合适的x和y（可见又不影响底层数据）真的是很麻烦。where are 其他解决方法？

# 这个是函数说明文档里给的例子
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
p + annotate("text", x = 4, y = 25, label = "Some text")

geom_text()与facet的互作应该分为两类，一类是数据点的标签，无论怎么分面，这类标签都是跟着数据点走的，点在哪个分面里，标签就在哪个分面里。另一类应该是单独加入的文本标签，如问题1。后两个问题可以用如下方法解决：

# 准备标签数据
facet_text1 <- data.frame(x=c(Inf,Inf,-Inf),y=c(Inf,-Inf,Inf),label=c("text1","text2","text3"))

facet_text2 <- dplyr::tribble(
~x,~y,~label,~cyl,
Inf,Inf,"text1",16,
Inf,-Inf,"text2",24,
-Inf,Inf,"text3",32)

p1 <- ggplot(mtcars) + geom_point(aes(x=disp,y=mpg,color=cyl)) + geom_text(data=facet_text1,aes(x=x,y=y,label=label,hjust="inward",vjust="inward")) +facet_grid(~cyl) + labs(title="same labels in each facet")

p2 <- ggplot(mtcars) + geom_point(aes(x=disp,y=mpg,color=cyl)) + geom_text(data=facet_text2,aes(x=x,y=y,label=label,hjust="inward",vjust="inward")) +facet_grid(~cyl) + labs(title="different labels in each facet")

p3 <- ggplot(mtcars) + geom_point(aes(x=disp,y=mpg,color=cyl)) + geom_text(data=facet_text2[1,],aes(x=x,y=y,label=label,hjust="inward",vjust="inward")) +facet_grid(~cyl) + labs(title="only one facet has a label")

ggpubr::ggarrange(p1,p2,p3,nrow=1)

总结：标签数据中包含与绘图数据相同的“分面变量”，如例子中的“cyl”。

如何给不同的分面加不同的标签?

# 验证下在两个维度上都分面：
facet_text3 <- dplyr::tribble(
~x,~y,~label,~cyl,~am,
Inf,Inf,"text1",16,0,
Inf,-Inf,"text2",24,1,
-Inf,Inf,"text3",32,0)

p4 <- ggplot(mtcars) + geom_point(aes(x=disp,y=mpg,color=cyl)) + geom_text(data=facet_text3,aes(x=x,y=y,label=label,hjust="inward",vjust="inward")) +facet_grid(am~cyl) + labs(title="different labels in each facet when two-dimension faceting")
p4

两个维度分面时，给不同的分面加不同的标签

geom_label()的哪些参数控制背景方框的外观，可以参见上若干篇。再次总结下：label.padding控制label周围与背景方框之间填充的部分；label.r控制背景方框四个角的圆角角度；label.size控制背景方框边界线的大小。
arrow()的说明文档：
Describe arrows to add to a line，也就是这个函数没有任何绘图作用，仅仅返回一个与箭头有关的描述，需要传递给一个画线的函数（需要有参数用于接收箭头描述），才能产生一个带有箭头的线。

arrow(angle = 30, length = unit(0.25, "inches"),
      ends = "last", type = "open")

angle：指定箭头的角度，值越小，箭头越窄、越尖，主要用于描述箭头的宽度。
length：指定箭头长度，从顶部到底部。
ends：在线的哪一头加箭头，“last”、“first”、“both”。
type：箭头头部是否应该是一个封闭的三角形，“open”、“closed”。

# 返回值是一个列表，描述箭头的各种属性
> arrow()
$angle
[1] 30

$length
[1] 0.25inches

$ends
[1] 2

$type
[1] 1

attr(,"class")
[1] "arrow"

一系列图演示不同的参数：

# 我选择用geom_segment()函数画线，它的参数arrow可以接收arrow()的返回值
segments <- dplyr::tribble(
~x,~y,~xend,~yend,~text,
1,1,2,1,'arrow(angle=30)',
1,2,2,2,'arrow(angle=60)',
3,1,4,1,'arrow(length=unit(0.25, "inches"))',
3,2,4,2,'arrow(length=unit(0.5, "inches"))',
5,1,6,1,'arrow(ends="last")',
5,2,6,2,'arrow(ends="first")',
5,1.5,6,1.5,'arrow(ends="both")',
7,1,8,1,'arrow(type="open")',
7,2,8,2,'arrow(type="closed")')
# 创建一个存储arrow()返回值的列表
# 之所以没有用dataframe结构存箭头信息，是因为放进dataframe后再取出会丢失属性信息，被这个整抑郁了，跳个绳回来才调整回来。
arrows<-list(arrow(angle=30),arrow(angle=60),arrow(length=unit(0.25, "inches")),arrow(length=unit(0.5, "inches")),arrow(ends="last"),arrow(ends="first"),arrow(ends="both"),arrow(type="open"),arrow(type="closed"))

p <- ggplot(segments)
for(i in seq_along(arrows)){
    df <- segments[i,]
    arrow <- arrows[[i]]
    p <- p + geom_segment(data=df,aes(x=x,y=y,xend=xend,yend=yend),size=1,arrow=arrow)
}
p + geom_text(data=segments,aes(x=x,y=y,label=text,hjust="left",vjust="bottom"),nudge_y = 0.1)

`arrow()`的各个参数

108. Graphics for communication练

28.2.1练习题

28.3.1练习题

猜你喜欢

热点阅读