Topic 1. SCI 文章中 Meta 分析之 metafo
最近搞了一下 Meta 分析,蛮有意思的,掌握了这些技能未来发文章不用愁,Meta 分析也是医学上很重要的方法之一,这期基于现有数据,分析几个经典的图形,包括森林图(forest)、漏洞图(funnel)、星状图(radial)、拉贝图(labbe)、以及Q-Q正态分位图(Q-Q normal)
01 Meta 分析的概念
1976年,Glass首次将这一概念命名为 Meta-analysis (荟萃分析)。并定义为一种对不同研究结果进行收集、合并及统计分析的方法。这种方法逐渐发展为一门新兴学科——“循证医学”的主要内容和研究手段。荟萃分析的主要目的是将以往的研究成果更为客观地综合反映出来。研究者并不进行原始的研究,而是将研究以获得的结果进行综合分析。
02 输入数据模式
单个变量
多分类变量
连续性变量
还可以检测模型系数并获得可信区间,以及对参数进行精准检验如置换检验(permutation)。
程序包安装,如下:
install.packages("metafor")library("metafor")
读取软件包数据,来自13项检验卡介苗抗结核效果的研究的结果,meta分析的目的是检查卡介苗预防结核病的总体有效性,并检查可能影响效果大小的缓减剂。该数据集已经在一些出版物中被用来说明元分析方法,如下:
data("dat.bcg")print(dat.bcg, row.names = FALSE)trial author year tpos tneg cpos cneg ablat alloc1 Aronson 1948 4 119 11 128 44 random2 Ferguson & Simes 1949 6 300 29 274 55 random3 Rosenthal et al 1960 3 228 11 209 42 random4 Hart & Sutherland 1977 62 13536 248 12619 52 random5 Frimodt-Moller et al 1973 33 5036 47 5761 13 alternate6 Stein & Aronson 1953 180 1361 372 1079 44 alternate7 Vandiviere et al 1973 8 2537 10 619 19 random8 TPT Madras 1980 505 87886 499 87892 13 random9 Coetzee & Berjak 1968 29 7470 45 7232 27 random10 Rosenthal et al 1961 17 1699 65 1600 42 systematic11 Comstock et al 1974 186 50448 141 27197 18 systematic12 Comstock & Webster 1969 5 2493 3 2338 33 systematic13 Comstock et al 1976 27 16886 29 17825 33 systematic#######数据说明?dat.bcgFormatThe data frame contains the following columns:trial numeric trial numberauthor character author(s)year numeric publication yeartpos numeric number of TB positive cases in the treated (vaccinated) grouptneg numeric number of TB negative cases in the treated (vaccinated) groupcpos numeric number of TB positive cases in the control (non-vaccinated) groupcneg numeric number of TB negative cases in the control (non-vaccinated) groupablat numeric absolute latitude of the study location (in degrees)alloc character method of treatment allocation (random, alternate, or systematic assignment)
这13项研究以下列表格提供数据细节,如下:
TB positiveTB negativevaccinated grouptpostnegcontrol groupcposcneg
03 实例解析
计算log相对风险和相应的抽样方差,如下:
dat <- escalc(measure = "RR", ai = tpos, bi = tneg, ci = cpos,di = cneg, data = dat.bcg, append = TRUE)print(dat[,-c(4:7)], row.names = FALSEtrial author year ablat alloc yi vi1 Aronson 1948 44 random -0.8893 0.32562 Ferguson & Simes 1949 55 random -1.5854 0.19463 Rosenthal et al 1960 42 random -1.3481 0.41544 Hart & Sutherland 1977 52 random -1.4416 0.02005 Frimodt-Moller et al 1973 13 alternate -0.2175 0.05126 Stein & Aronson 1953 44 alternate -0.7861 0.00697 Vandiviere et al 1973 19 random -1.6209 0.22308 TPT Madras 1980 13 random 0.0120 0.00409 Coetzee & Berjak 1968 27 random -0.4694 0.056410 Rosenthal et al 1961 42 systematic -1.3713 0.073011 Comstock et al 1974 18 systematic -0.3394 0.012412 Comstock & Webster 1969 33 systematic 0.4459 0.532513 Comstock et al 1976 33 systematic -0.0173 0.0714
演示公式界面,如下:
k <- length(dat.bcg$trial)dat.fm <- data.frame(study = factor(rep(1:k, each = 4)))dat.fm$grp <- factor(rep(c("T", "T", "C", "C"), k), levels = c("T", "C"))dat.fm$out <- factor(rep(c("+", "-", "+", "-"), k), levels = c("+", "-"))dat.fm$freq <- with(dat.bcg, c(rbind(tpos, tneg, cpos, cneg)))dat.fmstudy grp out freq1 1 T + 42 1 T - 1193 1 C + 114 1 C - 1285 2 T + 66 2 T - 3007 2 C + 298 2 C - 2749 3 T + 310 3 T - 22811 3 C + 1112 3 C - 20913 4 T + 6214 4 T - 1353615 4 C + 24816 4 C - 1261917 5 T + 3318 5 T - 503619 5 C + 4720 5 C - 576121 6 T + 18022 6 T - 136123 6 C + 37224 6 C - 107925 7 T + 826 7 T - 253727 7 C + 1028 7 C - 61929 8 T + 50530 8 T - 8788631 8 C + 49932 8 C - 8789233 9 T + 2934 9 T - 747035 9 C + 4536 9 C - 723237 10 T + 1738 10 T - 169939 10 C + 6540 10 C - 160041 11 T + 18642 11 T - 5044843 11 C + 14144 11 C - 2719745 12 T + 546 12 T - 249347 12 C + 348 12 C - 233849 13 T + 2750 13 T - 1688651 13 C + 2952 13 C - 17825
随机效应模型,如下:
#默认模型为REML模型Random-effect model using Restricted maximum likelihood estimatorres <- rma(yi, vi, data = dat)resRandom-Effects Model (k = 13; tau^2 estimator: REML)tau^2 (estimated amount of total heterogeneity): 0.3132 (SE = 0.1664)tau (square root of estimated tau^2 value): 0.5597I^2 (total heterogeneity / total variability): 92.22%H^2 (total variability / sampling variability): 12.86Test for Heterogeneity:Q(df = 12) = 152.2330, p-val < .0001Model Results:estimate se zval pval ci.lb ci.ub-0.7145 0.1798 -3.9744 <.0001 -1.0669 -0.3622 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1res <- rma(ai = tpos, bi = tneg, ci = cpos,di = cneg, data = dat, measure = "RR")resRandom-Effects Model (k = 13; tau^2 estimator: REML)tau^2 (estimated amount of total heterogeneity): 0.3132 (SE = 0.1664)tau (square root of estimated tau^2 value): 0.5597I^2 (total heterogeneity / total variability): 92.22%H^2 (total variability / sampling variability): 12.86Test for Heterogeneity:Q(df = 12) = 152.2330, p-val < .0001Model Results:estimate se zval pval ci.lb ci.ub-0.7145 0.1798 -3.9744 <.0001 -1.0669 -0.3622 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
异质性的置信区间可用metafor中confint()可以求异质性系数(tau^2,tau,I^2(%),H^2)的置信区间,如下:
### confidence intervals for tau^2, I^2, and H^2confint(res)estimate ci.lb ci.ubtau^2 0.3132 0.1197 1.1115tau 0.5597 0.3460 1.0543I^2(%) 92.2214 81.9177 97.6781H^2 12.8558 5.5303 43.0680# 估算效应量的预测/可信度区间predict(res, digits = 3)pred se ci.lb ci.ub pi.lb pi.ub-0.715 0.180 -1.067 -0.362 -1.867 0.438
forest 森林图
森林图是以统计指标和统计分析方法为基础,用数值运算结果绘制出的图型。它在平面直角坐标系中,以一条垂直的无效线(横坐标刻度为1或0)为中心,用平行于横轴的多条线段描述了每个被纳入研究的效应量和可信区间,用一个棱形(或其它图形)描述了多个研究合并的效应量及可信区间。它非常简单和直观地描述了Meta分析的统计结果,是Meta分析中最常用的结果表达形式。如下:
### forst plotforest(res, slab = paste(dat$author, dat$year, sep = ", "),xlim = c(-16, 6), at = log(c(.05, .25, 1, 4)), atransf = exp,ilab = cbind(dat$tpos, dat$tneg, dat$cpos, dat$cneg),ilab.xpos = c(-9.5, -8, -6, -4.5), cex = .75)op <- par(cex = .75, font = 2)text(c(-9.5, -8, -6, -4.5), 15, c("TB+", "TB-", "TB+", "TB-"))text(c(-8.75, -5.25), 16, c("Vaccinated", "Control"))text(-16, 15, "Author(s) and Year", pos = 4)text(6, 15, "Relative Risk [95% CI]", pos = 2)par(op)
混合效应模型,如下:
### mixed-effects model with absolute latitude and publication year as moderatorsres <- rma(yi, vi, mods = cbind(ablat, year), data = dat)resMixed-Effects Model (k = 13; tau^2 estimator: REML)tau^2 (estimated amount of residual heterogeneity): 0.1108 (SE = 0.0845)tau (square root of estimated tau^2 value): 0.3328I^2 (residual heterogeneity / unaccounted variability): 71.98%H^2 (unaccounted variability / sampling variability): 3.57R^2 (amount of heterogeneity accounted for): 64.63%Test for Residual Heterogeneity:QE(df = 10) = 28.3251, p-val = 0.0016Test of Moderators (coefficients 2:3):QM(df = 2) = 12.2043, p-val = 0.0022Model Results:estimate se zval pval ci.lb ci.ubintrcpt -3.5455 29.0959 -0.1219 0.9030 -60.5724 53.4814ablat -0.0280 0.0102 -2.7371 0.0062 -0.0481 -0.0080 **year 0.0019 0.0147 0.1299 0.8966 -0.0269 0.0307---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1### predicted average relative risks for 10, 20, 30, 40, and 50 degrees### absolute latitude holding publication year constant at 1970predict(res, newmods = cbind(seq(from = 10, to = 60, by = 10), 1970),transf = exp, addx = TRUE)### plot of the predicted average relative risk as a function of absolute latitudepreds <- predict(res, newmods = cbind(0:60, 1970), transf = exp)wi <- 1/sqrt(dat$vi)size <- 0.5 + 3 * (wi - min(wi))/(max(wi) - min(wi))plot(dat$ablat, exp(dat$yi), pch = 19, cex = size,xlab = "Absolute Latitude", ylab = "Relative Risk",las = 1, bty = "l", log = "y")lines(0:60, preds$pred)lines(0:60, preds$ci.lb, lty = "dashed")lines(0:60, preds$ci.ub, lty = "dashed")abline(h = 1, lty = "dotted")
funnel 漏斗图
漏斗图是由Light等在1984年提出,一般以单个研究的效应量为横坐标,样本含量为纵坐标做的散点图。效应量可以为RR、OR和死亡比或者其对数值等。理论上讲,被纳入Meta分析的各独立研究效应的点估计,在平面坐标系中的集合应为一个倒置的漏斗形,因此称为漏斗图。
样本量小,研究精度低,分布在漏斗图的底部,向周围分散;
样本量大,研究精度高,分布在漏斗图的顶部,向中间集中。
实际使用时,做Meta分析的研究个数较少时不宜做漏斗图,一般推荐Meta分析的研究个数在10个及以上才需做漏斗图。漏斗图主要用于观察某个系统评价或Meta分析结果是否存在偏倚,如发表偏倚或其他偏倚,如下图:
### funnel plotsres <- rma(yi, vi, data = dat)funnel(res, main = "Random-Effects Model")res <- rma(yi, vi, mods = cbind(ablat), data = dat)funnel(res, main = "Mixed-Effects Model")#######Begger's检验ranktest(res)Rank Correlation Test for Funnel Plot AsymmetryKendall's tau = 0.0256, p = 0.9524##########Egger's检验regtest(res)Regression Test for Funnel Plot AsymmetryModel: mixed-effects meta-regression modelPredictor: standard errorTest for Funnel Plot Asymmetry: z = -0.4623, p = 0.6439
radial 星状图
主要反映各研究的异质性,从而发现异质性的点。弧线对应的效应评估大小分布。如下:
### radial plotsres <- rma(yi, vi, data = dat, method = "REML")radial(res, main = "Random-Effects Model")res <- rma(yi, vi, data = dat, method = "ML")radial(res, main = "Mixed-Effects Model")
Q-Q分位图
绘制预测结果,观察是否在置信区间内部,如下:
### qq-normal plotsres <- rma(yi, vi, data = dat)qqnorm(res, main = "Random-Effects Model")res <- rma(yi, vi, mods = cbind(ablat), data = dat)qqnorm(res, main = "Mixed-Effects Model")
labbe 拉贝图
该函数计算两组的臂水平结果(例如,治疗和控制),并将它们相互比较。特别是功能块的原始比例两组互相在分析风险差异,比例在分析的日志(日志)风险比率,日志赔率在分析优势比(日志),平方根反正弦转换比例在分析平方根反正弦转换风险差异,分析发病率差异时用原始发病率,分析发病率比时用发病率的对数(log),分析发病率差异时用发病率的平方根转换,如下:
res <- rma(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)### default plotlabbe(res)### funnel plot with risk values on the x- and y-axis and add gridlabbe(res, transf=exp, grid=TRUE)# }
04 分析结果解读
可以看到I^2为71.98%,属于高度异质性,可采用随机效应模型。异质性低的时候可以采用固定效应模型和随机效应模型,结果差别不大,但高异质性只能选择随机效应模型,否则会使结果外推性受到约束。此处选择随机效应模型是出于保守情况考虑。
random-effect model是基于跨研究间存在异质性的假设,该合并模型承认研究间异质性的存在,但是不对异质性加以处理;如果纳入合并的研究间存在异质性,尽管未达到我们常规设定的I^2>50%,但是在用 fixed-effect model合并时,默认运算直接忽略这一部分异质性的存在,这样合并的结果会造成假阳性误差,而选用random-effect model合并时,尽管不处理异质性,但是其默认运算承认异质性的存在,合并结果更可信!
从漏斗图以及Begg's检验及Egger's 检验的结果可以看出,P值都是大于0.05的,也就意味着没有发表偏倚。
Meta 分析同样是发文章的一种选择,搞清这些图解以及内部的方法,至于SCI能发到多少分,还需要看我们研究的课题的新颖程度,而方法也就是这些,表现形式基本都有,只要您有好的数据,顺利发表个SCI高分还是指日可待的,下期在聊聊其他软件!
关注公众号,扫码进群,免费解答,后期会有免费直播教程,敬请期待!
Reference:
Viechtbauer, W. (2010). Conducting Meta-Analyses in R with the metafor Package. Journal of Statistical Software, 36(3), 1–48.
Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. Journal of the American Medical Association, 271(9), 698–702.
Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. Statistics in Medicine, 14(4), 395–411.
van Houwelingen, H. C., Arends, L. R., & Stijnen, T. (2002). Advanced methods in meta-analysis: Multivariate approach and meta-regression. Statistics in Medicine, 21(4), 589–624.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48.
公众号:桓峰基因 生物信息分析,SCI文章撰写及生物信息基础知识学习:R语言学习,perl基础编程,linux系统命令,Python遇见更好的你 44篇原创内容 -->
本文使用 文章同步助手 同步