制作 table 1 的经验 (by stata and R)_
2019-12-04 本文已影响0人
liang_rujiang
INTRODUCTION
Table 1 是描述研究对象基本信息的一张表,在各个研究中被经常使用。其中必须的部分是描述性统计(集中趋势、离散趋势、频数、频率),可选的部分是不同组之间的对比(ttest,F-test)。stata中有大量的包可以帮我们做到这一点,但是这些包的输出和偏医学类的结果表还是有点差别,需要手动调整。
使用stata时,
- 连续型变量,对象不分组,即只需要描述部分变量总的一个分布,情况比较简单。使用
tabstat varlist, c(s) s(mean sd)
即可。 - 连续型变量,对象分组时,也可以用上面的生成描述性部分,只需要加
by()
选项即可。 - 分类型变量,不分组时,
tab1 varlist
。 - 分类型变量,分组时,
tab1 varlist if grpvar == 1
... - 连续型变量,ttest,F-test可以用foreach循环挨个来。
- 分类型变量,chi-square,fisher可以用
foreach
循环和tab
加选项来搞定。
以上过程过于枯燥,且可能在抄写过程中带来差错,iebaltab可以在一定程度上帮助我们。请运行下面的例子观察。当然,首先安装该包ssc install ietoolkit
。
一些特点:
- 该包的遗憾是无法显示T-statistic,神奇的是,R 的tableone包也是这样的。但某些医学类或医学类相关的基础研究中常需要报告这个结果。
- 对分类变量无效,出不来分类变量的描述性和chi-square,fisher结果(R的tableone可以应付chi-square)
- 不区分连续型变量的分布和等方差假设(R的tableone可以应付分布情况,但需要用户自己先看一下分布,然后告诉函数,要对哪些变量使用非参数检验,检验方法也可选,我个人的用法是直接把感兴趣的连续型变量挑出来直接变形成长数据然后ggplot绘制分面图观察一下即可)
- tableone 可以同时处理有分组和无分组的情况,同时处理描述和检验的结果,同时处理分类和连续变量,区分连续型变量的分布正态与否。
- tableone使用时注意分类变量要么是以factor存在于dataframe中,要么是以numeric存在。以后面这种情况存在时,需要在函数中声明,否则会被当作连续型变量对待。
总的来说,鄙人的经验是R的tableone包更为强大
目前我的经验中,没有好的方法显示T-statistic,有懂得的大神欢迎评论区分享经验(我自己找到了,见下面,先安装包ssc install asdoc
(14号凌晨四点半更新, t2docx看起来也好用,但只在15以上stata运行,我用14.1,无法测试,就不说了。))
sysuse auto, clear
asdoc, row(t-value)
foreach i of varlist price-wei {
ttest `i', by(for)
asdoc, row(`r(t)')
}
一个完整且比较美好的例子
sysuse auto, clear
cap rm Myfile.doc
asdoc tabstat price-we, by(for) stat(mean sd) dec(3)
asdoc, row(t-value, p-value)
foreach i of varlist price-wei {
ttest `i', by(for)
asdoc, row(`r(t)', `r(p)') dec(3)
}
输出如下
稍微修改一下
图片.png
EXAMPLES WITH IEBALTAB IN STATA
set more off
sysuse auto, clear
des
fmiss
drop rep78
des
iebaltab price headroom length, grpvar(foreign) save(temp) replace onerow pt std format(%7.2f)
* onerow displays the number of observations in additional row at the bottom of the table if
* each group has the same number of observations for all variables in balancevarlist.
* pttest makes this command show p-values instead of difference-in-mean
* between the groups in the column for t-tests.
* stdev displays standard deviations in parenthesis instead of standard errors
gen grp = mod(_n, 3)
tab1 grp
iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow
iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow co(1)
* control(groupcode) One group is tested against all other groups in t-tests and F-tests.
* Default is all groups against each other.
iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow co(1) ftest
* I do not know what ftest mean.
iebaltab price headroom length, grpvar(grp) save(temp) replace pt std onerow co(1) feqt pf
* using feqt and pf options, I get p-values of f-test. only using feqt option, I get F-measures.
/*a provoking example*/
global project_folder "C:\Users\project\baseline\results"
iebaltab outcome_variable, grpvar(treatment_variable) save("$project_folder\balancetable.xlsx")
EXAMPLES WITH TABLEONE IN R
data <- row_data
data$age %>% hist
data$for_duration %>% hist
data$for_income %>% hist
data$sbp %>% hist
data$dbp %>% hist
data %>% summarise(age_mean = mean(age),
age_sd = sd(age),
income_median = median(for_income),
income_iqr = IQR(for_income),
duration_median = median(for_duration),
duration_iqr = IQR(for_duration),
sbp_mean = mean(sbp),
sbp_sd = sd(sbp),
dbp_mean = mean(dbp),
dbp_sd = sd(dbp)) %>%
gather()
cat_des <- function(df, chr) {
out <- vector("list", length = 2)
out[[1]] <- table(df[[chr]])
out[[2]] <- prop.table(table(df[[chr]]))
out
}
data %>%
discard(is.numeric) %>%
names() %>%
map(~cat_des(data, .)) %>%
map(2)
cat_des(data, "bloodlevel")
cat_des(data, "adherence")
# # -----------------------------------------------------------------------
data %>% filter(adherence == 'nonad') %>%
summarise(age_mean = mean(age),
age_sd = sd(age),
income_median = median(for_income),
income_iqr = IQR(for_income),
duration_median = median(for_duration),
duration_iqr = IQR(for_duration),
sbp_mean = mean(sbp),
sbp_sd = sd(sbp),
dbp_mean = mean(dbp),
dbp_sd = sd(dbp)) %>%
gather()
data %>%
discard(is.numeric) %>%
names() %>%
map(~cat_des(filter(data, adherence == "nonad"), .)) %>%
map(2)
cat_des(filter(data, adherence == "nonad"), "bloodlevel")
# VERY IMPORTANT 02May2019 ------------------------------------------------
library(tableone)
vars <- setdiff(names(data), "adherence")
vars # change order of charistics
vars <- c("age", "gender", "education", "urbanity", "for_income",
"t", "for_duration", "cliniccheck", "sbp", "dbp", "diabete")
tableone <- CreateTableOne(data = data, vars = vars, strata = "adherence")
print(tableone, nonnormal = c("for_income", "for_duration"),
explain = T, showAllLevels = T, catDigits = 2, quote = T)
vars <- c("age", "gender", "education", "urbanity", "for_income",
"t", "for_duration", "cliniccheck", "sbp", "dbp", "diabete")
CreateTableOne(data = data, vars = vars) %>% print(
nonnormal = c("for_income", "for_duration"),
showAllLevels = T, catDigits = 2, quote = T)