陆子的国学课堂简友广场想法

R语言进行方差分析示例

2021-04-30  本文已影响0人  Cache_wood

方差分析的本质是研究分类变量数值变量的影响

总误差SST = 组内误差(SSE)+ 组间误差(SSA)

组内误差:误差平方和,组间误差:处理平方和

SST = \sum_{i=1}^{4}\sum_{j=1}^{n_i}(x_{ij}-\bar{\bar{x}})^2\\ SSE = \sum_{i=1}^{4}\sum_{j=1}^{n_i}(x_{ij}-\bar{x}_i)^2\\ SSA = \sum_{i=1}^{4}\sum_{j=1}^{n_i}(\bar{x}_i-\bar{\bar{x}})^2 = \sum_{i=1}^{4}n_i(\bar{x}_i-\bar{\bar{x}})^2

构造F统计量
F = \frac{\frac{SSA}{k-1}}{\frac{SSE}{n-k}}\sim F(k-1,n-k)
定义MSA =\frac{SSA}{k-1}\\ MSE = \frac{SSE}{n-k}

eg<-data.frame(行业=c(rep(c('零售业','旅游业','航空公司','家电制造业'),c(7,6,5,5))),
               vol = c(57,66,49,40,34,53,44,
                       68,39,29,45,56,51,
                       31,49,21,34,40,
                       44,51,65,77,58))
fit<-aov(vol~行业,data=eg)
summary(fit)
            Df Sum Sq Mean Sq F value
行业         3   1457   485.5   3.407
Residuals   19   2708   142.5        
            Pr(>F)  
行业        0.0388 *
Residuals           
---
Signif. codes:  
  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
  0.1 ‘ ’ 1

结果解读:
SSA = 1457, Df = 3, 均方MSA = 485.5\\ SSE = 2708, Df = 19, 均方MSE = 142.5\\ F检验统计量为3.407\\ p-value = 0.0388<0.05\\
所以可以认为行业对被投诉次数有影响

eg<-data.frame(manager=c(rep(c('m1','m2','m3'),c(5,7,6))),
               vol = c(7,7,8,7,9,
                       8,9,8,10,9,10,8,
                       5,6,5,7,4,8))
fit<-aov(vol~manager,data = eg)
summary(fit)
            Df Sum Sq Mean Sq F value
manager      2  29.61  14.805   11.76
Residuals   15  18.89   1.259        
              Pr(>F)    
manager     0.000849 ***
Residuals               
---
Signif. codes:  
  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
  0.1 ‘ ’ 1
上一篇下一篇

猜你喜欢

热点阅读