R

一学就会直方图,圆润又光滑

2020-10-03  本文已影响0人  小洁忘了怎么分身

0.前言

放假了,豆豆去了海陵岛,花花十几天前就回了山东老家。在家刷statquest,第一节直方图,很简单舒适的一个开始,我顺手把图也给画了。

直方图+密度图可以,直方图+分布曲线图也可以~

1.R包和数据的准备

rm(list=ls())
library(ggplot2)
library(dplyr)
set.seed(1001)
dat = data.frame(length1 = rnorm(2000,500,60),
                 length2 = c(rnorm(1000,500,60),rnorm(1000,800,60)),
                 group = rep(c("A","B"),each = 1000))
head(dat)
##    length1  length2 group
## 1 631.3189 504.8925     A
## 2 489.3472 587.9790     A
## 3 488.8835 529.2282     A
## 4 349.6078 453.0826     A
## 5 466.5613 524.2960     A
## 6 491.3864 467.0581     A

生成了两组数据,length1是一组均值为500的正态分布数据,length2是两组正态分布数据,均值分别为500和800。

密度图与直方图的叠加,基础包与ggplot2都可以实现。

1.基础包

1.1.直方图+密度图

hist(dat$length1,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(density(dat$length1))  
hist(dat$length2,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(density(dat$length2)) 

1.2.直方图+分布曲线

dat2 = data.frame(d1 = dnorm(1:1000,500,60),
                  d2 = dnorm(1:1000,500,60),
                  d3 = dnorm(1:1000,800,60),
                  n = 1:1000)

hist(dat$length1,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(dat2$d1)  
hist(dat$length2,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(dat2$d2) 
lines(dat2$d3)

2.ggplot2

2.1.直方图+密度图

ggplot(dat, aes(x = length1)) +
  geom_histogram(aes(y = ..density..),color = "grey",fill = "grey",alpha = 0.7)+
  geom_density(color = "grey")+
  theme_bw()
mes = group_by(dat,group) %>% summarise(mean = mean(length2)) 
ggplot(dat, aes(x = length2,group = group)) +
  geom_histogram(aes(y = ..density..,fill = group,
                     color = group),alpha = 0.2,bins = 25)+
  geom_density(aes(y = ..density..,color = group))+
  geom_vline(data = mes,aes(xintercept = mean,color = group),lty =4)+
  scale_color_manual(values = c('#D0505D','#6194A7'))+
  scale_fill_manual(values = c('#D0505D','#6194A7'))+
  theme_bw()

双峰的图可以分两组画,给两组分别画密度线,我还顺手给他改了改颜色,标记了均值线,好看!

2.2.直方图+分布曲线

ggplot(dat, aes(x = length1)) +
  geom_histogram(aes(y = ..density..),color = "grey",fill = "grey",alpha = 0.7)+
  geom_line(color = "grey",dat = dat2,aes(x = n,y = d1))+
  theme_bw()+
  xlim(c(300,750))
mes = group_by(dat,group) %>% summarise(mean = mean(length2)) 
ggplot(dat) +
  geom_histogram(aes(y = ..density..,fill = group,
                     x = length2,
                     color = group),alpha = 0.2,bins = 25)+
  geom_line(dat = dat2,aes(x = n,y = d2),color = '#D0505D')+
  geom_line(dat = dat2,aes(x = n,y = d3),color = '#6194A7')+
  geom_vline(data = mes,aes(xintercept = mean,color = group),lty =4)+
  scale_color_manual(values = c('#D0505D','#6194A7'))+
  scale_fill_manual(values = c('#D0505D','#6194A7'))+
  theme_bw()+
  xlim(c(260,1000))

瞄了一眼,看到分布曲线图基础包和ggplot2画的不一样,想了一下 可能是因为设置的bins(就是柱子)不一样宽~不改了。

上一篇下一篇

猜你喜欢

热点阅读