ggplot | 数据分布可视化
2023-01-01 本文已影响0人
生命数据科学
在生物信息数据分析中,了解每个样本的数据分布对于选择分析流程和分析方法是很有帮助的,而如何更加直观、有效地画出数据分布图,是值得思考的问题
1. 所需要的包
library(ggdist)
library(tidyquant)
library(tidyverse)
library(ggsci)
2. 常规作图
比较常见的数据分布图绘制主要为箱线图和小提琴图
1.1 示例数据
示例数据为ggplot2
包自带数据,用到的是分类变量cyl
,连续变量``
> head(mpg)
# A tibble: 6 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
1.2 箱线图
最简单的就是箱线图了,能够绘制出数据的离群值、四分位数和四分位间距
# 箱线图
library(ggplot2)
library(ggsci)
p = ggplot(mpg, aes(x=factor(cyl), y=cty,fill=factor(cyl))) +
geom_boxplot()+
scale_fill_lancet()
p
# 依旧运用了ggsci包来填充颜色
image
1.3 小提琴图
小提琴图相比于箱线图,能够多展示一个信息,数据密度
# 同样的数据
# 小提琴图
library(ggplot2)
library(ggsci)
p = ggplot(mpg, aes(x=factor(cyl), y=cty,fill=factor(cyl))) +
geom_violin()+
scale_fill_lancet()
p
# 依旧运用了ggsci包来填充颜色
image
3. 云雨图
先看最终效果图吧~
image
图主要由3部分组成:
- 云
- 箱线图
- 雨
3.1 先画云
mpg %>%
filter(cyl %in% c(4,6,8)) %>%
ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
# add half-violin from {ggdist} package
ggdist::stat_halfeye(
## custom bandwidth
adjust = 0.5,
## move geom to the right
justification = -.2,
## remove slab interval
.width = 0,
point_colour = NA
)
image
3.2 云+箱线图
mpg %>%
filter(cyl %in% c(4,6,8)) %>%
ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
# add half-violin from {ggdist} package
ggdist::stat_halfeye(
## custom bandwidth
adjust = 0.5,
## move geom to the right
justification = -.2,
## remove slab interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = .15,
## remove outliers
outlier.color = NA,
alpha = 0.5
)
image
3.3 云+雨+箱线图
mpg %>%
filter(cyl %in% c(4,6,8)) %>%
ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
# add half-violin from {ggdist} package
ggdist::stat_halfeye(
## custom bandwidth
adjust = 0.5,
## move geom to the right
justification = -.2,
## remove slab interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = .15,
## remove outliers
outlier.color = NA,
alpha = 0.5
) +
# Add dot plots from {ggdist} package
ggdist::stat_dots(
## orientation to the left
side = "left",
## move geom to the left
justification = 1.1,
## adjust grouping (binning) of observations
binwidth = .25
)
image
3.4 方向不太对,颠倒一下
mpg %>%
filter(cyl %in% c(4,6,8)) %>%
ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
# add half-violin from {ggdist} package
ggdist::stat_halfeye(
## custom bandwidth
adjust = 0.5,
## move geom to the right
justification = -.2,
## remove slab interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = .15,
## remove outliers
outlier.color = NA,
alpha = 0.5
) +
# Add dot plots from {ggdist} package
ggdist::stat_dots(
## orientation to the left
side = "left",
## move geom to the left
justification = 1.1,
## adjust grouping (binning) of observations
binwidth = .25
) +coord_flip()
image
3.5 给它点颜色看看
mpg %>%
filter(cyl %in% c(4,6,8)) %>%
ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
# add half-violin from {ggdist} package
ggdist::stat_halfeye(
## custom bandwidth
adjust = 0.5,
## move geom to the right
justification = -.2,
## remove slab interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = .15,
## remove outliers
outlier.color = NA,
alpha = 0.5
) +
# Add dot plots from {ggdist} package
ggdist::stat_dots(
## orientation to the left
side = "left",
## move geom to the left
justification = 1.1,
## adjust grouping (binning) of observations
binwidth = .25
) +coord_flip()+
# Adjust theme
scale_fill_lancet() +
scale_color_lancet()
image
3.6 去掉背景
mpg %>%
filter(cyl %in% c(4,6,8)) %>%
ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
# add half-violin from {ggdist} package
ggdist::stat_halfeye(
## custom bandwidth
adjust = 0.5,
## move geom to the right
justification = -.2,
## remove slab interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = .15,
## remove outliers
outlier.color = NA,
alpha = 0.5
) +
# Add dot plots from {ggdist} package
ggdist::stat_dots(
## orientation to the left
side = "left",
## move geom to the left
justification = 1.1,
## adjust grouping (binning) of observations
binwidth = .25
) +coord_flip()+
# Adjust theme
scale_fill_lancet() +
scale_color_lancet()+
theme_bw()+
theme_classic()
image
3.7 可以再改改标题
p <- mpg %>%
filter(cyl %in% c(4,6,8)) %>%
ggplot(aes(x = factor(cyl), y = cty, fill = factor(cyl),color = factor(cyl))) +
# add half-violin from {ggdist} package
ggdist::stat_halfeye(
## custom bandwidth
adjust = 0.5,
## move geom to the right
justification = -.2,
## remove slab interval
.width = 0,
point_colour = NA
) +
geom_boxplot(
width = .15,
## remove outliers
outlier.color = NA,
alpha = 0.5
) +
# Add dot plots from {ggdist} package
ggdist::stat_dots(
## orientation to the left
side = "left",
## move geom to the left
justification = 1.1,
## adjust grouping (binning) of observations
binwidth = .25
) +
# Adjust theme
scale_fill_lancet() +
scale_color_lancet()+
theme_bw()+
theme_classic()+
labs(title = "Raincloud_plot",
x="cyl",
fill="cyl_Type",color="cyl_Type")+
coord_flip()
p
image
基本上就大功告成啦,最后可以保存一下
ggsave(p,filename = "raincloud_plot.jpg",height = 4,width = 5)
4. 小结
宝剑锋从磨砺出,梅花香自苦寒来
画图其实就是这样,简单的图3句话就能写完,而要做得完美又好看,总是需要更大的高质量
但为什么我画图如此迅速呢?
因为我有ggplot2
的小抄~
基本上所有常见的图形对应的语法都有了,遇到各种图形需求也能游刃有余
感谢观看,如果有用还请点赞,关注,在看,转发!