R语言:Wilcoxon符号秩检验和Hodges-Lemmann
2022-03-25 本文已影响0人
Cache_wood
@[toc]
单样本符号秩检验
为解决垃圾邮件对大型公司决策层的工作影响程度,某网站收集了19家大型公司的CEO和他们邮箱里每天收到的垃圾邮件件数,得到如下数据(单位:封)
310 350 370 377 389 400 415 425 440 295
325 296 250 340 298 365 375 360 385
从平均意义上看,垃圾邮件数量的中心位置是否超出320封?
spammail<-c(310,350,370,377,389,400,415,440,295,
325,296,250,340,298,365,375,360,385)
hist(spammail,freq=F,breaks=max((spammail)-min(spammail))/25.12,xlim=c(200,500))
lines(density(spammail),col=2,lwd=2)
n<-length(spammail)
spammail1<-spammail-320
absrank<-rank(abs(spammail1))
signrank<-sign(spammail1)*absrank
w0<-sum(signrank[signrank>0]);
w0
z0<-(w0-n*(n+1)/4-1/2)/sqrt(n*(n+1)*(2*n+1)/24)
pval<-1-pnorm(z0,0,1);
pval
[1] 0.01049487
wilcox.test(spammail-320,alternative = 'greater')
binom.test(sum(spammail>320),length(spammail),0.5,alternative = 'greater')
Exact binomial test
data: sum(spammail > 320) and length(spammail)
number of successes = 13, number of trials
= 18, p-value = 0.04813
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
0.5021718 1.0000000
sample estimates:
probability of success
0.7222222
配对数据的wilcoxon检验
example1
下表显示了12例患有关节炎的患者使用两种镇痛药所能缓解的时间。 有没有证据表明一种药物比另一种药物提供更长的缓解时间?
drugA<-c(2,3.6,2.6,2.6,7.3,3.4,14.9,6.6,2.3,2,6.8,8.5)
drugB<-c(3.5,5.7,2.9,2.4,9.9,3.3,16.7,6,3.8,4,9.1,20.9)
wilcox.test(drugA,drugB,paired=T)
Wilcoxon signed rank test with continuity
correction
data: drugA and drugB
V = 7, p-value = 0.01344
alternative hypothesis: true location shift is not equal to 0
example2
R内置数据immer(MASS程序包)中记录了同一区域在1931年和1932年的大麦产量,请在𝜶=𝟎.𝟎𝟓水平下回答:1931年和32年的大麦产量是否具有相同的总体分布
Wilcoxon signed rank test with continuity
correction
data: immer$Y1 and immer$Y2
V = 368.5, p-value = 0.005318
alternative hypothesis: true location shift is not equal to 0
Walsh平均值
假设为简单随机抽样,计算任意两个数的平均数得到一组长度为
的新数据,这组数据成为Walsh平均值(配对平均数)
Hodges-Lemmann估计量
假设独立同分布取自𝑭(𝒙−𝜽)若𝑭对称,则定义Walsh平均值的中位数如下
𝜽 =median{(𝑿𝒊+𝑿𝒋)/𝟐,𝒊≤𝒋,𝒊,𝒋=𝟏, 𝟐,⋅⋅⋅,𝒏},
并将其作为𝜽 的Hodges-Lemmann估计量.
meat<-c(62,70,74,75,77,80,83,85,88)
walsh<-NULL
for (i in 1:length(meat)){
for (j in 1:length(meat)){
walsh<-c(walsh,(meat[i]+meat[j])/2)
}
}
walsh
median(walsh)
> walsh
[1] 62.0 66.0 68.0 68.5 69.5 71.0 72.5 73.5
[9] 75.0 66.0 70.0 72.0 72.5 73.5 75.0 76.5
[17] 77.5 79.0 68.0 72.0 74.0 74.5 75.5 77.0
[25] 78.5 79.5 81.0 68.5 72.5 74.5 75.0 76.0
[33] 77.5 79.0 80.0 81.5 69.5 73.5 75.5 76.0
[41] 77.0 78.5 80.0 81.0 82.5 71.0 75.0 77.0
[49] 77.5 78.5 80.0 81.5 82.5 84.0 72.5 76.5
[57] 78.5 79.0 80.0 81.5 83.0 84.0 85.5 73.5
[65] 77.5 79.5 80.0 81.0 82.5 84.0 85.0 86.5
[73] 75.0 79.0 81.0 81.5 82.5 84.0 85.5 86.5
[81] 88.0
> median(walsh)
[1] 77.5
example3
波士顿房价数据(boston.txt)是波士顿不同地区506个家庭住房信息,其中包含决定房价的结构因素、环境因素和教育因素等.该数据集共有506个观测、14个变量,其中两个变量(CHAS,RAD)是分类变量,其余变量是连续型数值变量
计算中位数
library(MASS)
head(Boston)
med <- NULL
for (i in 1:length(Boston)){
med <- c(med,median(Boston[,i]))
}
med
> med
[1] 0.25651 0.00000 9.69000 0.00000
[5] 0.53800 6.20850 77.50000 3.20745
[9] 5.00000 330.00000 19.05000 391.44000
[13] 11.36000 21.20000
绘制分布图
par(mfrow=c(2,4))
for (i in 1:8){
hist(Boston[,i])
#lines(density(Boston[,i]),col=2,lwd=2)
}
par(mfrow=c(2,3))
for (i in 9:length(Boston)){
hist(Boston[,i])
#lines(density(Boston[,i]),col=2,lwd=2)
}