R-如何计算样本序列的置信区间并绘制带有置信区间的Barplot
2019-12-27 本文已影响0人
TroyShen
目录
- 0.问题引入
- 1.示例数据
- 2.计算样本数据序列的置信区间
- 3.样本数据置信区间可视化数据框构建
- 4.样本数据置信区间可视化(图1)
- 5.在Barplot 中增加文字说明(图2)
- 6.总结
- 7.本篇所使用的软件包(没有的需要用install.packages('')进行安装)
- 8.致谢
0. 问题引入
在我们处理地理/实验样本数据的时候,可能会需要对于典型点/典型样本的序列进行统计分析,体现在一定显著性水平下典型点/典型样本的统计特征,这时候我们面临以下两个技术细节:
- 如何计算样本序列的置信区间?
- 如何将样本的置信区间进行可视化?
1. 示例数据
本实验示例数据通过以下代码随机生成,可视化结果可能与下文示例图有出入。
x = LETTERS[1:10]
y = runif(1000, -10,10)
colnames(y) = x
head(y)
A B C D
[1,] 4.524860 -7.3445746 -6.439900 9.1539736
[2,] 5.822885 -4.3644429 -7.866430 -0.3942861
[3,] -2.204920 -7.4995754 -7.518962 4.4767627
[4,] -4.915803 -0.4365128 7.768294 2.4638002
[5,] -1.154929 -4.4947919 -8.646688 3.0954898
[6,] -9.154364 5.4484339 -9.566622 -3.9255011
E F G
[1,] 3.9060027 0.42257072 -2.48072582
[2,] -3.0508460 0.08481233 3.92705319
[3,] -9.7084660 -1.63433896 9.47799297
[4,] 0.5258416 -5.27407052 0.03590268
[5,] -0.9525900 -8.55815396 1.03945867
[6,] -0.1960090 -1.94827378 -8.86757475
H I J
[1,] 8.917407 -8.9758134 -1.422388
[2,] -7.832482 -8.5410536 9.173685
[3,] -8.502113 2.1070542 -5.903512
[4,] 4.375888 0.5945806 -2.485896
[5,] -9.202834 -8.4865285 6.049102
[6,] 6.166935 0.3131027 -6.399058
2. 计算样本数据序列的置信区间
默认显著性水平p < 0.05
y = matrix(y, ncol = 10)
y_statistic = apply(y,2,CI)
head(y_statistic)
[,1] [,2] [,3]
upper 1.0263200 0.8966174 1.1068900
mean -0.1453326 -0.3028532 -0.1041544
lower -1.3169851 -1.5023239 -1.3151989
[,4] [,5] [,6]
upper 0.9076051 1.1043008 1.12495923
mean -0.2275783 -0.0576591 0.08110512
lower -1.3627616 -1.2196190 -0.96274899
[,7] [,8] [,9]
upper 0.5812071 1.4890171 0.9575675
mean -0.5852506 0.3316813 -0.2092299
lower -1.7517082 -0.8256545 -1.3760272
[,10]
upper 1.5280800
mean 0.2473222
lower -1.0334356
3. 样本数据置信区间可视化数据框构建
ymin = y_statistic[1,]
ymean = y_statistic[2,]
ymax = y_statistic[3,]
pl_df = data.frame(x = x, y = ymean, ymin = ymin,ymax =ymax)
4. 样本数据置信区间可视化(图1)
setwd('L:\\JianShu\\20191227')
p = ggplot()+
geom_bar(data = pl_df, aes(x = x, y = y, fill = x),
position = 'dodge', stat = 'identity', width = 0.5)+
geom_errorbar(data = pl_df,aes(x = x,ymin = ymin, ymax = ymax),
width = 0.2,
linetype = "solid",
position = position_dodge(width = 0.5),
color="black", size=0.8)+
theme_bw()+
theme(
axis.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
axis.title = element_text(face = 'bold',color = 'black',size = 14,hjust = 0.5),
legend.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
legend.title = element_blank(),
legend.position = 'bottom',
legend.direction = 'horizontal',
legend.key.height=unit(0.1,"inches"),
legend.key.width=unit(0.5,"inches")
)+
xlab('Sample Index')+
ylab('Test Value')
png('plot1.png',
height = 10,
width = 20,
units = 'cm',
res = 800)
print(p)
dev.off()
图1 样本数据置信区间可视化
根据图1, 我们可以看到A-J 样本的置信区间, 但是然后呢?是不是有点像 “多喝热水,少熬夜,早点睡。。”之类的有用的废话一样,完全从图1中看不出任何定量的信息?!
原因是啥呢,因为图中信息太少了,我们需要加文字进去进行补充说明如何做请看第5节**
5. 在Barplot 中增加文字说明(图2)
如图2,这样就好多了嘛
p2 = ggplot()+
geom_bar(data = pl_df, aes(x = x, y = y, fill = x),
position = 'dodge', stat = 'identity', width = 0.5)+
geom_errorbar(data = pl_df,aes(x = x,ymin = ymin, ymax = ymax),
width = 0.2,
linetype = "solid",
position = position_dodge(width = 0.5),
color="black", size=0.8)+
geom_text(data = pl_df, aes(x = x,y = y, label=round(y,2)),
position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
geom_text(data = pl_df, aes(x = x,y = ymin, label=round(ymin,2)),
position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
geom_text(data = pl_df, aes(x = x,y = ymax, label=round(ymax,2)),
position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
theme_bw()+
theme(
axis.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
axis.title = element_text(face = 'bold',color = 'black',size = 14,hjust = 0.5),
legend.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
legend.title = element_blank(),
legend.position = 'bottom',
legend.direction = 'horizontal',
legend.key.height=unit(0.1,"inches"),
legend.key.width=unit(0.5,"inches"),
text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5)
)+
xlab('Sample Index')+
ylab('Test Value')
png('plot2.png',
height = 10,
width = 20,
units = 'cm',
res = 800)
print(p2)
dev.off()
图2 在Barplot 中增加文字说明
6. 总结
本篇主要解决了以下两个问题:
- 如何计算样本序列的置信区间?
- 如何将样本的置信区间进行可视化?
7. 本篇所使用的软件包(没有的需要用install.packages('')进行安装)
library('ggplot2')
library('Rmisc')
8. 致谢
感谢大家的持续关注,小编会继续努力,持续更新下去的!
大家如果觉得有用,还麻烦大家转发点赞加关注哈,也可以扩散到朋友圈,多谢大家啦~
大家如果在使用本文代码的过程有遇到问题的,可以留言评论,也可以私信我哈~~
小编联系方式