Topic 1. SCI 文章中 Meta 分析之 metafo

2022-02-27 本文已影响0人桓峰基因

最近搞了一下 Meta 分析，蛮有意思的，掌握了这些技能未来发文章不用愁，Meta 分析也是医学上很重要的方法之一，这期基于现有数据，分析几个经典的图形，包括森林图（forest）、漏洞图（funnel）、星状图（radial）、拉贝图（labbe）、以及Q-Q正态分位图（Q-Q normal）

01 Meta 分析的概念

1976年，Glass首次将这一概念命名为 Meta-analysis (荟萃分析)。并定义为一种对不同研究结果进行收集、合并及统计分析的方法。这种方法逐渐发展为一门新兴学科——“循证医学”的主要内容和研究手段。荟萃分析的主要目的是将以往的研究成果更为客观地综合反映出来。研究者并不进行原始的研究，而是将研究以获得的结果进行综合分析。

02 输入数据模式

单个变量

多分类变量

连续性变量

还可以检测模型系数并获得可信区间，以及对参数进行精准检验如置换检验（permutation）。

程序包安装，如下：

install.packages("metafor")library("metafor")

读取软件包数据，来自13项检验卡介苗抗结核效果的研究的结果，meta分析的目的是检查卡介苗预防结核病的总体有效性，并检查可能影响效果大小的缓减剂。该数据集已经在一些出版物中被用来说明元分析方法，如下：

data("dat.bcg")print(dat.bcg, row.names = FALSE)trial               author year tpos  tneg cpos  cneg ablat      alloc     1              Aronson 1948    4   119   11   128    44     random     2     Ferguson & Simes 1949    6   300   29   274    55     random     3      Rosenthal et al 1960    3   228   11   209    42     random     4    Hart & Sutherland 1977   62 13536  248 12619    52     random     5 Frimodt-Moller et al 1973   33  5036   47  5761    13  alternate     6      Stein & Aronson 1953  180  1361  372  1079    44  alternate     7     Vandiviere et al 1973    8  2537   10   619    19     random     8           TPT Madras 1980  505 87886  499 87892    13     random     9     Coetzee & Berjak 1968   29  7470   45  7232    27     random    10      Rosenthal et al 1961   17  1699   65  1600    42 systematic    11       Comstock et al 1974  186 50448  141 27197    18 systematic    12   Comstock & Webster 1969    5  2493    3  2338    33 systematic    13       Comstock et al 1976   27 16886   29 17825    33 systematic#######数据说明?dat.bcg  FormatThe data frame contains the following columns:trial  numeric  trial numberauthor  character  author(s)year  numeric  publication yeartpos  numeric  number of TB positive cases in the treated (vaccinated) grouptneg  numeric  number of TB negative cases in the treated (vaccinated) groupcpos  numeric  number of TB positive cases in the control (non-vaccinated) groupcneg  numeric  number of TB negative cases in the control (non-vaccinated) groupablat  numeric  absolute latitude of the study location (in degrees)alloc  character  method of treatment allocation (random, alternate, or systematic assignment)

这13项研究以下列表格提供数据细节，如下：

TB positiveTB negativevaccinated grouptpostnegcontrol groupcposcneg

03 实例解析

计算log相对风险和相应的抽样方差，如下：

dat <- escalc(measure = "RR", ai = tpos, bi = tneg, ci = cpos,               di = cneg, data = dat.bcg, append = TRUE)print(dat[,-c(4:7)], row.names = FALSEtrial               author year ablat      alloc      yi     vi      1              Aronson 1948    44     random -0.8893 0.3256      2     Ferguson & Simes 1949    55     random -1.5854 0.1946      3      Rosenthal et al 1960    42     random -1.3481 0.4154      4    Hart & Sutherland 1977    52     random -1.4416 0.0200      5 Frimodt-Moller et al 1973    13  alternate -0.2175 0.0512      6      Stein & Aronson 1953    44  alternate -0.7861 0.0069      7     Vandiviere et al 1973    19     random -1.6209 0.2230      8           TPT Madras 1980    13     random  0.0120 0.0040      9     Coetzee & Berjak 1968    27     random -0.4694 0.0564     10      Rosenthal et al 1961    42 systematic -1.3713 0.0730     11       Comstock et al 1974    18 systematic -0.3394 0.0124     12   Comstock & Webster 1969    33 systematic  0.4459 0.5325     13       Comstock et al 1976    33 systematic -0.0173 0.0714

演示公式界面，如下：

k <- length(dat.bcg$trial)dat.fm <- data.frame(study = factor(rep(1:k, each = 4)))dat.fm$grp <- factor(rep(c("T", "T", "C", "C"), k), levels = c("T", "C"))dat.fm$out <- factor(rep(c("+", "-", "+", "-"), k), levels = c("+", "-"))dat.fm$freq <- with(dat.bcg, c(rbind(tpos, tneg, cpos, cneg)))dat.fm   study grp out  freq1      1   T   +     42      1   T   -   1193      1   C   +    114      1   C   -   1285      2   T   +     66      2   T   -   3007      2   C   +    298      2   C   -   2749      3   T   +     310     3   T   -   22811     3   C   +    1112     3   C   -   20913     4   T   +    6214     4   T   - 1353615     4   C   +   24816     4   C   - 1261917     5   T   +    3318     5   T   -  503619     5   C   +    4720     5   C   -  576121     6   T   +   18022     6   T   -  136123     6   C   +   37224     6   C   -  107925     7   T   +     826     7   T   -  253727     7   C   +    1028     7   C   -   61929     8   T   +   50530     8   T   - 8788631     8   C   +   49932     8   C   - 8789233     9   T   +    2934     9   T   -  747035     9   C   +    4536     9   C   -  723237    10   T   +    1738    10   T   -  169939    10   C   +    6540    10   C   -  160041    11   T   +   18642    11   T   - 5044843    11   C   +   14144    11   C   - 2719745    12   T   +     546    12   T   -  249347    12   C   +     348    12   C   -  233849    13   T   +    2750    13   T   - 1688651    13   C   +    2952    13   C   - 17825

随机效应模型，如下：

#默认模型为REML模型Random-effect model using Restricted maximum likelihood estimatorres <- rma(yi, vi, data = dat)resRandom-Effects Model (k = 13; tau^2 estimator: REML)tau^2 (estimated amount of total heterogeneity): 0.3132 (SE = 0.1664)tau (square root of estimated tau^2 value):      0.5597I^2 (total heterogeneity / total variability):   92.22%H^2 (total variability / sampling variability):  12.86Test for Heterogeneity:Q(df = 12) = 152.2330, p-val < .0001Model Results:estimate      se     zval    pval    ci.lb    ci.ub  -0.7145  0.1798  -3.9744  <.0001  -1.0669  -0.3622  *** ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1res <- rma(ai = tpos, bi = tneg, ci = cpos,            di = cneg, data = dat, measure = "RR")resRandom-Effects Model (k = 13; tau^2 estimator: REML)tau^2 (estimated amount of total heterogeneity): 0.3132 (SE = 0.1664)tau (square root of estimated tau^2 value):      0.5597I^2 (total heterogeneity / total variability):   92.22%H^2 (total variability / sampling variability):  12.86Test for Heterogeneity:Q(df = 12) = 152.2330, p-val < .0001Model Results:estimate      se     zval    pval    ci.lb    ci.ub  -0.7145  0.1798  -3.9744  <.0001  -1.0669  -0.3622  *** ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

异质性的置信区间可用metafor中confint（）可以求异质性系数（tau^2,tau,I^2(%),H^2)的置信区间，如下：

### confidence intervals for tau^2, I^2, and H^2confint(res)     estimate   ci.lb   ci.ub tau^2    0.3132  0.1197  1.1115 tau      0.5597  0.3460  1.0543 I^2(%)  92.2214 81.9177 97.6781 H^2     12.8558  5.5303 43.0680 # 估算效应量的预测/可信度区间predict(res, digits = 3) pred    se  ci.lb  ci.ub  pi.lb pi.ub  -0.715 0.180 -1.067 -0.362 -1.867 0.438

forest 森林图

森林图是以统计指标和统计分析方法为基础，用数值运算结果绘制出的图型。它在平面直角坐标系中，以一条垂直的无效线(横坐标刻度为1或0)为中心，用平行于横轴的多条线段描述了每个被纳入研究的效应量和可信区间，用一个棱形(或其它图形)描述了多个研究合并的效应量及可信区间。它非常简单和直观地描述了Meta分析的统计结果，是Meta分析中最常用的结果表达形式。如下：

### forst plotforest(res, slab = paste(dat$author, dat$year, sep = ", "),        xlim = c(-16, 6), at = log(c(.05, .25, 1, 4)), atransf = exp,       ilab = cbind(dat$tpos, dat$tneg, dat$cpos, dat$cneg),        ilab.xpos = c(-9.5, -8, -6, -4.5), cex = .75)op <- par(cex = .75, font = 2)text(c(-9.5, -8, -6, -4.5), 15, c("TB+", "TB-", "TB+", "TB-"))text(c(-8.75, -5.25),       16, c("Vaccinated", "Control"))text(-16,                   15, "Author(s) and Year",     pos = 4)text(6,                     15, "Relative Risk [95% CI]", pos = 2)par(op)

混合效应模型，如下：

### mixed-effects model with absolute latitude and publication year as moderatorsres <- rma(yi, vi, mods = cbind(ablat, year), data = dat)resMixed-Effects Model (k = 13; tau^2 estimator: REML)tau^2 (estimated amount of residual heterogeneity):     0.1108 (SE = 0.0845)tau (square root of estimated tau^2 value):             0.3328I^2 (residual heterogeneity / unaccounted variability): 71.98%H^2 (unaccounted variability / sampling variability):   3.57R^2 (amount of heterogeneity accounted for):            64.63%Test for Residual Heterogeneity:QE(df = 10) = 28.3251, p-val = 0.0016Test of Moderators (coefficients 2:3):QM(df = 2) = 12.2043, p-val = 0.0022Model Results:         estimate       se     zval    pval     ci.lb    ci.ub intrcpt   -3.5455  29.0959  -0.1219  0.9030  -60.5724  53.4814     ablat     -0.0280   0.0102  -2.7371  0.0062   -0.0481  -0.0080  ** year       0.0019   0.0147   0.1299  0.8966   -0.0269   0.0307     ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1### predicted average relative risks for 10, 20, 30, 40, and 50 degrees ### absolute latitude holding publication year constant at 1970predict(res, newmods = cbind(seq(from = 10, to = 60, by = 10), 1970),         transf = exp, addx = TRUE)### plot of the predicted average relative risk as a function of absolute latitudepreds <- predict(res, newmods = cbind(0:60, 1970), transf = exp)wi    <- 1/sqrt(dat$vi)size  <- 0.5 + 3 * (wi - min(wi))/(max(wi) - min(wi))plot(dat$ablat, exp(dat$yi), pch = 19, cex = size,      xlab = "Absolute Latitude", ylab = "Relative Risk",      las = 1, bty = "l", log = "y")lines(0:60, preds$pred)lines(0:60, preds$ci.lb, lty = "dashed")lines(0:60, preds$ci.ub, lty = "dashed")abline(h = 1, lty = "dotted")

funnel 漏斗图

漏斗图是由Light等在1984年提出，一般以单个研究的效应量为横坐标，样本含量为纵坐标做的散点图。效应量可以为RR、OR和死亡比或者其对数值等。理论上讲，被纳入Meta分析的各独立研究效应的点估计，在平面坐标系中的集合应为一个倒置的漏斗形，因此称为漏斗图。

样本量小，研究精度低，分布在漏斗图的底部，向周围分散；

样本量大，研究精度高，分布在漏斗图的顶部，向中间集中。

实际使用时，做Meta分析的研究个数较少时不宜做漏斗图，一般推荐Meta分析的研究个数在10个及以上才需做漏斗图。漏斗图主要用于观察某个系统评价或Meta分析结果是否存在偏倚，如发表偏倚或其他偏倚，如下图：

### funnel plotsres <- rma(yi, vi, data = dat)funnel(res, main = "Random-Effects Model")res <- rma(yi, vi, mods = cbind(ablat), data = dat)funnel(res, main = "Mixed-Effects Model")#######Begger's检验ranktest(res)Rank Correlation Test for Funnel Plot AsymmetryKendall's tau = 0.0256, p = 0.9524##########Egger's检验regtest(res)Regression Test for Funnel Plot AsymmetryModel:     mixed-effects meta-regression modelPredictor: standard errorTest for Funnel Plot Asymmetry: z = -0.4623, p = 0.6439

radial 星状图

主要反映各研究的异质性，从而发现异质性的点。弧线对应的效应评估大小分布。如下：

### radial plotsres <- rma(yi, vi, data = dat, method = "REML")radial(res, main = "Random-Effects Model")res <- rma(yi, vi, data = dat, method = "ML")radial(res, main = "Mixed-Effects Model")

Q-Q分位图

绘制预测结果，观察是否在置信区间内部，如下：

### qq-normal plotsres <- rma(yi, vi, data = dat)qqnorm(res, main = "Random-Effects Model")res <- rma(yi, vi, mods = cbind(ablat), data = dat)qqnorm(res, main = "Mixed-Effects Model")

labbe 拉贝图

该函数计算两组的臂水平结果(例如，治疗和控制)，并将它们相互比较。特别是功能块的原始比例两组互相在分析风险差异,比例在分析的日志(日志)风险比率,日志赔率在分析优势比(日志),平方根反正弦转换比例在分析平方根反正弦转换风险差异,分析发病率差异时用原始发病率，分析发病率比时用发病率的对数(log)，分析发病率差异时用发病率的平方根转换，如下：

res <- rma(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)### default plotlabbe(res)### funnel plot with risk values on the x- and y-axis and add gridlabbe(res, transf=exp, grid=TRUE)# }

04 分析结果解读

可以看到I^2为71.98%，属于高度异质性，可采用随机效应模型。异质性低的时候可以采用固定效应模型和随机效应模型，结果差别不大，但高异质性只能选择随机效应模型，否则会使结果外推性受到约束。此处选择随机效应模型是出于保守情况考虑。

random-effect model是基于跨研究间存在异质性的假设，该合并模型承认研究间异质性的存在，但是不对异质性加以处理；如果纳入合并的研究间存在异质性，尽管未达到我们常规设定的I^2>50%，但是在用 fixed-effect model合并时，默认运算直接忽略这一部分异质性的存在，这样合并的结果会造成假阳性误差，而选用random-effect model合并时，尽管不处理异质性，但是其默认运算承认异质性的存在，合并结果更可信！

从漏斗图以及Begg's检验及Egger's 检验的结果可以看出，P值都是大于0.05的，也就意味着没有发表偏倚。

Meta 分析同样是发文章的一种选择，搞清这些图解以及内部的方法，至于SCI能发到多少分，还需要看我们研究的课题的新颖程度，而方法也就是这些，表现形式基本都有，只要您有好的数据，顺利发表个SCI高分还是指日可待的，下期在聊聊其他软件！

关注公众号，扫码进群，免费解答，后期会有免费直播教程，敬请期待！

Reference:

Viechtbauer, W. (2010). Conducting Meta-Analyses in R with the metafor Package. Journal of Statistical Software, 36(3), 1–48.

Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. Journal of the American Medical Association, 271(9), 698–702.

Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. Statistics in Medicine, 14(4), 395–411.

van Houwelingen, H. C., Arends, L. R., & Stijnen, T. (2002). Advanced methods in meta-analysis: Multivariate approach and meta-regression. Statistics in Medicine, 21(4), 589–624.

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48.

公众号：桓峰基因 生物信息分析，SCI文章撰写及生物信息基础知识学习：R语言学习，perl基础编程，linux系统命令，Python遇见更好的你 44篇原创内容 -->

本文使用文章同步助手同步

Topic 1. SCI 文章中 Meta 分析之 metafo

异质性的置信区间可用metafor中confint（）可以求异质性系数（tau^2,tau,I^2(%),H^2)的置信区间，如下：

猜你喜欢

热点阅读