Biostatistics(9)R实例:负贝努里分布和几何分布
通过两个类似的例子区分负贝努里分布和几何分布,同时了解R语言中与这两个分布相对应的函数: dnbinom(),dgeom()。并且通过作图学习如何绘制柱状图、将不同数据绘制在同一幅图及添加图例等
负贝努里分布
You are now assigned to investigate the people who have a family name “Cao”.Today you pay a visit to a small village located in central Henan Province. According to previous census of population, we know that 1/4 households here use “Cao” as their family name. Suppose their residential area is randomly distributed,
(1) how many households you are expected to visit until you gather 3 households named with “Cao”? Then you find that people who have a family name “Song” are also useful to your research. Based on the record “10 percentage households named with ‘Song’” in census report, you conduct another investigation.
(2) How many households you are expected to visit until you gather 3 households named with “Song”?
(3) Please make a statement after comparing two investigations’ variance to show weather the expected value of “Song” is reliable.
(4) Finally, overlay two bar plots which represent the relationship between required visit times and corresponding probability in one chart.Choose proper colors, axis limits and add legend & title. (hint: some of following functions are useful in your homework: dgeom(), dnbinom(), barplot(,add=T), rgb(,alpha=))
A:
The variance of visit numbers in “Song” survey is much higher than “Cao”. Thus, we suggest that the value of “Song” survey is NOT reliable.
(4)
#生成x:从0到60
x<-c(0:60)
#画柱状图,主标题为:The Probability of Required Visit Times,y轴标签为Probability,x轴标签为Visit Times,每个条下出现的名称的向量为x+5,颜色为灰色
barplot(dnbinom(x,3,0.25),ylim=c(0,0.1),main = "The Probability of Required Visit Times",ylab = "Probability",xlab = "Visit times",names.arg = x+5,col = "grey")
#dnbinom负贝努里分布,颜色为rgb(0,0.5,0.1),透明度为0.7,add=T,在原图上添加(不重新生成新图)
barplot(dnbinom(x,3,0.1),col = rgb(0,0.5,0.1,alpha = 0.7),add = T)
#在x=55,y=0.09处添加图例,pch=15表示符号为实心正方形,颜色分别为grey和rgb(0,0.5,0.1,alpha = 0.7)
legend(55,0.09,c("Cao","Song"),pch = 15,col = c("grey",rgb(0,0.5,0.1,alpha = 0.7)))
Figure1.png
几何分布
You are now assigned to investigate the people who have a family name “Cao”.Today you pay a visit to a small village located in central Henan Province. According to previous census of population, we know that 2/5 households here use “Cao” as their family name. Suppose their residential area is randomly distributed,
(1) how many households you are expected to visit until you find one household named with “Cao”?Then you find that people who have a family name “Song” are also useful to your research. Based on the record “15 percentage households named with ‘Song’” in census report, you conduct another investigation.
(2) How many households you are expected to visit until you find one household named with “Song”?
(3)Please make a statement after comparing two investigations’ variance to show weather the expected value of “Song” is reliable.
(4) Finally, overlay two bar plots which represent the relationship between required visit times and corresponding probability in one chart. Choose proper colors, axis limits and add legend & title. (hint: some of following functions are useful in your homework: dgeom(), dnbinom(), barplot(,add=T), rgb(,alpha=))
A:
The variance of visit numbers in “Song” survey is much higher than “Cao”. Thus, we suggest that the value of “Song” survey is NOT reliable.
(4)
x<-c(0:20)
barplot(dgeom(x,0.4), main = "The Probability of Require d Visit Times",ylab = "Probability",xlab = "Visit times",names.arg = x+1,col = "grey")
barplot(dgeom(x,0.15),col = rgb(0,0.5,0.1,alpha = 0.7),add = T) legend(15,0.4,c("Cao","Song"),pch = 15,col = c("grey",rgb(0,0.5,0.1,alpha = 0.7)))
Figure2.png