R数据科学第五章——蜂

2019-12-20  本文已影响0人  寂静之巅

R数据科学第五章

library(tidyverse)

变量的分布进行可视化
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut))
图片.png
diamonds %>%
  count(cut)
图片.png
连续变量,用直方图
ggplot(data = diamonds) +
  geom_histogram(mapping = aes(x = carat), binwidth = 0.5)
图片.png
smaller <- diamonds %>%
  filter(carat < 3)
ggplot(data = smaller, mapping = aes(x = carat)) +
  geom_histogram(binwidth = 0.1)
图片.png
ggplot(data = smaller, mapping = aes(x = carat, color = cut)) +
  geom_histogram(binwidth = 0.1)
ggplot(data = smaller, mapping = aes(x = carat, color = cut)) +
  geom_freqpoly(binwidth = 0.1)

ggplot(data = smaller, mapping = aes(x = carat)) +
        geom_histogram(binwidth = 0.1)
ggplot(data = faithful, mapping = aes(x = eruptions)) +
  geom_histogram(binwidth = 0.25)
图片.png
图片.png

5.3异常值

ggplot(diamonds) +
  geom_histogram(mapping = aes(x = y), binwidth = 0.5)
图片.png
ggplot(diamonds) +
  geom_histogram(mapping = aes(x = y), binwidth =0.5) +
  coord_cartesian(ylim = c(0,50))
图片.png
unusual <- diamonds %>%
  filter(y < 3 | y > 20) %>%
  arrange(y)
unusual
图片.png
ggplot(unusual) +
  geom_histogram(mapping = aes(x = y), binwidth =0.5) +
  coord_cartesian(ylim = c(0,50))
图片.png
钻石为0.99克拉和1克拉的数量,为什么出现这样的结果
m0.99 <- diamonds %>%
  filter(carat == 0.99)
m0.99
m1 <- diamonds %>%
  filter(carat == 1)
m1
图片.png
diamonds2 <- diamonds %>% 
  filter(between(y, 3, 20)) %>%
  arrange(y)
diamonds2
diamonds3 <- diamonds %>%
  mutate(y = ifelse(y < 3|y >20, NA, y))
diamonds3
ggplot(data = diamonds3, mapping = aes(x = x, y = y)) +
  geom_point()
###Warning message:
Removed 9 rows containing missing values (geom_point).
图片.png
nycflights13::flights %>% 
  mutate(
    cancelled = is.na(dep_time),
    sched_hour = sched_dep_time %/% 100,
    sched_min = sched_dep_time %% 100,
    sched_dep_time = sched_hour + sched_min / 60 
  ) %>%
  ggplot(mapping = aes(sched_dep_time)) + 
  geom_freqpoly(
    mapping = aes(color = cancelled),
    binwidth = 1/4)
图片.png

相关变动是两个或者多个变量以相关的方式共同变化所表现出的趋势

ggplot(data = diamonds, mapping = aes(x = price)) +
  geom_freqpoly(mapping = aes(color = cut), binwidth = 500)
图片.png
ggplot(data = diamonds, mapping = aes(x = price, y = ..density..)) +
  geom_freqpoly(mapping = aes(color = cut), binwidth = 500)
图片.png

箱线图可以将分类变量进行可视化

ggplot(data = diamonds, mapping = aes(x = cut, y = price)) +
  geom_boxplot(mapping = aes(color = cut)
)
图片.png
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
  geom_boxplot()
ggplot(data = mpg, mapping = aes(x = reorder(class, hwy, FUN = median), y = hwy)) +
  geom_boxplot() 
图片.png
图片.png
ggplot(data = mpg, mapping = aes(x = reorder(class, hwy, FUN = median), y = hwy)) +
  geom_boxplot() +
  coord_flip()
图片.png
个人学习笔记,记录的不够详细,比较粗糙,勿喷。
上一篇下一篇

猜你喜欢

热点阅读