各类统计方法R语言实现(一)

2020-04-08  本文已影响0人  拾光_2020

在临床研究中,统计是非常重要的一关,而在R语言中,可以轻松实现大多数统计方法。因此,今天小编将开启一个全新的系列推文:利用R语言实现各类常用统计方法,希望能对大家有所帮助。由于统计涵盖范围非常广泛,若有描述不当之处恳请各位小伙伴多多指正,谢谢!

描述性统计分析

此处选用的数据集是R语言自带的mtcar数据集,每一行为一种车的类型,如马自达RX4等;每一列为车的一种参数,mpg表示每加仑油行驶英里数,cyl表示气缸数(可分为4,5,6),hp马力,wt车重,am变速箱类型(0表示自动挡,1表示手动档),以此类推具体说明如下:

[, 1] mpg Miles/(US) gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs Engine (0 = V-shaped, 1 = straight) [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors

data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
#计算描述统计分析
#计算描述统计分析
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

可以看到结果中数值型变量会给出最大值、最小值、四分位数、中位数、均值,但明显此处的cyl,vs,am,gear,carb均为二分类或有序多分类变量,应该计数而非当作数值型变量处理,因此代码修改如下:

mtcars$cyl<-as.factor(mtcars$cyl)
mtcars$vs<-as.factor(mtcars$vs)
mtcars$am<-as.factor(mtcars$am)
mtcars$gear<-as.factor(mtcars$gear)
mtcars$carb<-as.factor(mtcars$carb)

#计算描述统计分析
summary(mtcars)
##       mpg        cyl         disp             hp             drat      
##  Min.   :10.40   4:11   Min.   : 71.1   Min.   : 52.0   Min.   :2.760  
##  1st Qu.:15.43   6: 7   1st Qu.:120.8   1st Qu.: 96.5   1st Qu.:3.080  
##  Median :19.20   8:14   Median :196.3   Median :123.0   Median :3.695  
##  Mean   :20.09          Mean   :230.7   Mean   :146.7   Mean   :3.597  
##  3rd Qu.:22.80          3rd Qu.:326.0   3rd Qu.:180.0   3rd Qu.:3.920  
##  Max.   :33.90          Max.   :472.0   Max.   :335.0   Max.   :4.930  
##        wt             qsec       vs     am     gear   carb  
##  Min.   :1.513   Min.   :14.50   0:18   0:19   3:15   1: 7  
##  1st Qu.:2.581   1st Qu.:16.89   1:14   1:13   4:12   2:10  
##  Median :3.325   Median :17.71                 5: 5   3: 3  
##  Mean   :3.217   Mean   :17.85                        4:10  
##  3rd Qu.:3.610   3rd Qu.:18.90                        6: 1  
##  Max.   :5.424   Max.   :22.90                        8: 1

可以看到结果中数值型变量会给出最大值、最小值、四分位数、中位数、均值,而因子型变量则会给出计数结果

也可以使用以下代码得到更多结果

# 计算均值、中位数、方差、标准差、变异系数、标准误
mystats<-function(x,na.omit=FALSE){
  if(na.omit)
      x<-x[!is.na(x)]
  mean<-mean(x)
  median<-mean(x)
  var<-var(x)
  sd<-sd(x)
  cv=100*sd(x)/mean(x)
  se=sd(x)/sqrt(length(x))
  return(c(mean=mean,median=median,var=var,sd=sd,cv=cv,se=se))
}
sapply(mtcars[,c("mpg","disp","hp")], mystats)
##              mpg        disp         hp
## mean   20.090625   230.72188  146.68750
## median 20.090625   230.72188  146.68750
## var    36.324103 15360.79983 4700.86694
## sd      6.026948   123.93869   68.56287
## cv     29.998808    53.71779   46.74077
## se      1.065424    21.90947   12.12032

四格表资料统计检验

注意以下要求 (1)所有的理论数T≥5并且总样本量n≥40,用Pearson卡方进行检验。 (2)如果理论数T<5但T≥1,并且1≥40,用连续性校正的卡方进行检验。 (3)如果有理论数T<1或n<40,则用Fisher’s检验。

tabl<-table(mtcars$vs,mtcars$am)
tabl
##    
##      0  1
##   0 12  6
##   1  7  7
#计算理论频数

T = function(x){
A = matrix(0,nrow(x),ncol(x))
for(i in 1:nrow(x)){
for(j in 1:ncol(x)) A[i,j] = sum(x[i,])*sum(x[,j])/sum(x)
}
A
}
T(tabl)
##         [,1]   [,2]
## [1,] 10.6875 7.3125
## [2,]  8.3125 5.6875
#卡方检验
chisq.test(tabl, correct = FALSE)
## 
##  Pearson's Chi-squared test
## 
## data:  tabl
## X-squared = 0.90688, df = 1, p-value = 0.3409
#连续矫正卡方检验
chisq.test(tabl, correct = TRUE)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tabl
## X-squared = 0.34754, df = 1, p-value = 0.5555
#Fisher’s检验
fisher.test(tabl)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tabl
## p-value = 0.4727
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   0.3825342 10.5916087
## sample estimates:
## odds ratio 
##   1.956055

相关性分析

#皮尔逊相关系数
cor(mtcars$mpg,mtcars$wt,method = "pearson")
## [1] -0.8676594
#斯皮尔曼相关系数
cor(mtcars$mpg,mtcars$wt,method = "spearman")
## [1] -0.886422
#kendall相关系数
cor(mtcars$mpg,mtcars$wt,method = "kendall")
## [1] -0.7278321
#相关系数的统计检验
#皮尔逊相关系数
cor.test(mtcars$mpg,mtcars$wt,method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$mpg and mtcars$wt
## t = -9.559, df = 30, p-value = 1.294e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9338264 -0.7440872
## sample estimates:
##        cor 
## -0.8676594
#斯皮尔曼相关系数
cor.test(mtcars$mpg,mtcars$wt,method = "spearman")
## Warning in cor.test.default(mtcars$mpg, mtcars$wt, method = "spearman"): Cannot
## compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  mtcars$mpg and mtcars$wt
## S = 10292, p-value = 1.488e-11
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## -0.886422
#kendall相关系数
cor.test(mtcars$mpg,mtcars$wt,method = "kendall")
## Warning in cor.test.default(mtcars$mpg, mtcars$wt, method = "kendall"): Cannot
## compute exact p-value with ties
## 
##  Kendall's rank correlation tau
## 
## data:  mtcars$mpg and mtcars$wt
## z = -5.7981, p-value = 6.706e-09
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##        tau 
## -0.7278321

好了,今天的R语言实现统计方法系列推文暂时告一段落,我们下次再见吧!

上一篇 下一篇

猜你喜欢

热点阅读