石头剪刀布机器学习与计算机视觉R语言与统计分析

R-如何计算样本序列的置信区间并绘制带有置信区间的Barplot

2019-12-27  本文已影响0人  TroyShen

目录


0. 问题引入

在我们处理地理/实验样本数据的时候,可能会需要对于典型点/典型样本的序列进行统计分析,体现在一定显著性水平下典型点/典型样本的统计特征,这时候我们面临以下两个技术细节:

  1. 如何计算样本序列的置信区间?
  2. 如何将样本的置信区间进行可视化?

1. 示例数据

本实验示例数据通过以下代码随机生成,可视化结果可能与下文示例图有出入。

x = LETTERS[1:10]
y = runif(1000, -10,10)
colnames(y) = x
 head(y)
             A          B         C          D
[1,]  4.524860 -7.3445746 -6.439900  9.1539736
[2,]  5.822885 -4.3644429 -7.866430 -0.3942861
[3,] -2.204920 -7.4995754 -7.518962  4.4767627
[4,] -4.915803 -0.4365128  7.768294  2.4638002
[5,] -1.154929 -4.4947919 -8.646688  3.0954898
[6,] -9.154364  5.4484339 -9.566622 -3.9255011
              E           F           G
[1,]  3.9060027  0.42257072 -2.48072582
[2,] -3.0508460  0.08481233  3.92705319
[3,] -9.7084660 -1.63433896  9.47799297
[4,]  0.5258416 -5.27407052  0.03590268
[5,] -0.9525900 -8.55815396  1.03945867
[6,] -0.1960090 -1.94827378 -8.86757475
             H          I         J
[1,]  8.917407 -8.9758134 -1.422388
[2,] -7.832482 -8.5410536  9.173685
[3,] -8.502113  2.1070542 -5.903512
[4,]  4.375888  0.5945806 -2.485896
[5,] -9.202834 -8.4865285  6.049102
[6,]  6.166935  0.3131027 -6.399058

2. 计算样本数据序列的置信区间

默认显著性水平p < 0.05

y = matrix(y, ncol = 10)
y_statistic = apply(y,2,CI)
head(y_statistic)
            [,1]       [,2]       [,3]
upper  1.0263200  0.8966174  1.1068900
mean  -0.1453326 -0.3028532 -0.1041544
lower -1.3169851 -1.5023239 -1.3151989
            [,4]       [,5]        [,6]
upper  0.9076051  1.1043008  1.12495923
mean  -0.2275783 -0.0576591  0.08110512
lower -1.3627616 -1.2196190 -0.96274899
            [,7]       [,8]       [,9]
upper  0.5812071  1.4890171  0.9575675
mean  -0.5852506  0.3316813 -0.2092299
lower -1.7517082 -0.8256545 -1.3760272
           [,10]
upper  1.5280800
mean   0.2473222
lower -1.0334356

3. 样本数据置信区间可视化数据框构建

ymin = y_statistic[1,]
ymean = y_statistic[2,]
ymax = y_statistic[3,]

pl_df = data.frame(x = x, y = ymean, ymin = ymin,ymax  =ymax)

4. 样本数据置信区间可视化(图1)

setwd('L:\\JianShu\\20191227')
p = ggplot()+
  geom_bar(data = pl_df, aes(x = x, y = y, fill = x),
           position = 'dodge',  stat = 'identity', width = 0.5)+
  geom_errorbar(data = pl_df,aes(x = x,ymin = ymin, ymax = ymax),
                width = 0.2,
                linetype = "solid",
                position = position_dodge(width = 0.5),
                color="black", size=0.8)+
  theme_bw()+
  theme(
    axis.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
    axis.title = element_text(face = 'bold',color = 'black',size = 14,hjust = 0.5),
    legend.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
    legend.title = element_blank(),
    legend.position = 'bottom',
    legend.direction = 'horizontal',
    legend.key.height=unit(0.1,"inches"),
    legend.key.width=unit(0.5,"inches")
  )+
  xlab('Sample Index')+
  ylab('Test Value')
  

png('plot1.png',
    height = 10,
    width = 20,
    units = 'cm',
    res = 800)
print(p)
dev.off()
图1 样本数据置信区间可视化

根据图1, 我们可以看到A-J 样本的置信区间, 但是然后呢?是不是有点像 “多喝热水,少熬夜,早点睡。。”之类的有用的废话一样,完全从图1中看不出任何定量的信息?!

原因是啥呢,因为图中信息太少了,我们需要加文字进去进行补充说明如何做请看第5节**


5. 在Barplot 中增加文字说明(图2)

如图2,这样就好多了嘛

p2 = ggplot()+
  geom_bar(data = pl_df, aes(x = x, y = y, fill = x),
           position = 'dodge',  stat = 'identity', width = 0.5)+
  geom_errorbar(data = pl_df,aes(x = x,ymin = ymin, ymax = ymax),
                width = 0.2,
                linetype = "solid",
                position = position_dodge(width = 0.5),
                color="black", size=0.8)+
  geom_text(data = pl_df, aes(x = x,y = y, label=round(y,2)),
          position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
  geom_text(data = pl_df, aes(x = x,y = ymin, label=round(ymin,2)),
            position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
  geom_text(data = pl_df, aes(x = x,y = ymax, label=round(ymax,2)),
            position = position_dodge(width = 1), vjust = 0.5,hjust = -0.3,size = 4)+
  theme_bw()+
  theme(
    axis.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
    axis.title = element_text(face = 'bold',color = 'black',size = 14,hjust = 0.5),
    legend.text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5),
    legend.title = element_blank(),
    legend.position = 'bottom',
    legend.direction = 'horizontal',
    legend.key.height=unit(0.1,"inches"),
    legend.key.width=unit(0.5,"inches"),
    text = element_text(face = 'bold',color = 'black',size = 12,hjust = 0.5)
  )+
  xlab('Sample Index')+
  ylab('Test Value')
  

png('plot2.png',
    height = 10,
    width = 20,
    units = 'cm',
    res = 800)
print(p2)
dev.off()
图2 在Barplot 中增加文字说明

6. 总结

本篇主要解决了以下两个问题:

  1. 如何计算样本序列的置信区间?
  2. 如何将样本的置信区间进行可视化?

7. 本篇所使用的软件包(没有的需要用install.packages('')进行安装)

library('ggplot2')
library('Rmisc')

8. 致谢

感谢大家的持续关注,小编会继续努力,持续更新下去的!

大家如果觉得有用,还麻烦大家转发点赞加关注哈,也可以扩散到朋友圈,多谢大家啦~

大家如果在使用本文代码的过程有遇到问题的,可以留言评论,也可以私信我哈~~


小编联系方式
上一篇下一篇

猜你喜欢

热点阅读