使用R包 tableone 做基线表(baseline)
总体比较简单,要注意变量类型,是连续型变量(continous variables)还是分类变量(categorical variables)。其次注意变量的分布情况,连续型变量是否符合正态分布,样本量是不是太少,最终选择不同的检验方法。
As you can see in the previous table, when there are two or more groups group comparison p-values are printed along with the table (well, let's not argue the appropriateness of hypothesis testing for table 1 in an RCT for now.). Very small p-values are shown with the less than sign. The hypothesis test functions used by default are chisq.test() for categorical variables (with continuity correction) and oneway.test() for continous variables (with equal variance assumption, i.e., regular ANOVA). Two-group ANOVA is equivalent of t-test.
You may be worried about the nonnormal variables and small cell counts in the stage variable. In such a situation, you can use the nonnormal argument like before as well as the exact (test) argument in the print() method. Now kruskal.test() is used for the nonnormal continous variables and fisher.test() is used for categorical variables specified in the exact argument. kruskal.test() is equivalent to wilcox.test() in the two-group case. The column named test is to indicate which p-values were calculated using the non-default tests.
# generate data with package named "wakefield" ----------------------------
# refer to https://github.com/trinker/wakefield
library(wakefield)
dat1 <- r_data_frame(100,
age(x=20:80),
sex(prob = c(0.8,0.2)),
smokes,
income,
animal,
likert(x=c("group1"),prob=c(1),name = "group")
)
dat2 <- r_data_frame(100,
age(x=30:100),
sex(prob = c(0.5,0.5)),
smokes,
income,
animal,
likert(x=c("group2"),prob=c(1),name = "group")
)
dat <- rbind(dat1,dat2)
# make baseline with packge named "tableone" -----------------------------
# https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html
summary(dat)
dput(names(dat))
a=CreateTableOne(vars=c("Age", "Sex", "Smokes", "Income"),
#Vector of variables to summarize
data = dat,
strata="group", #Multiple group summary
factorVars=c("Sex","Smokes"))
#Vector of categorical variables that need transformation
## Testing
?print.TableOne
summary(a)
print(a,showAllLevels = TRUE) #Showing all levels for categorical variables
print(a, nonnormal = c("Income"),
exact =c("Sex"),
smd=T)
# The hypothesis test functions used by default are chisq.test()
# for categorical variables (with continuity correction) and
# oneway.test() for continous variables (with equal variance
# assumption, i.e., regular ANOVA). Two-group ANOVA is equivalent
# of t-test.
## For nonnormal variables and small cell counts
# In such a situation, you can use the nonnormal argument like
# before as well as the exact (test) argument in the print() method.
# Now kruskal.test() is used for the nonnormal continous variables
# and fisher.test() is used for categorical variables specified in
# the exact argument. kruskal.test() is equivalent to wilcox.test()
# in the two-group case. The column named test is to indicate which
# p-values were calculated using the non-default tests.
## Exporting
a_csv<- print(a, nonnormal = c("Income"),
exact =c("Sex"),
smd=T,
showAllLevels = TRUE,
quote = FALSE,
noSpaces = TRUE,
printToggle = FALSE)
library("knitr")
kable(a_csv,
align = 'c',
caption = 'Table 1: Comparison of unmatched samples')
write.csv(a_csv, file = "myTable.csv")
level | group1 | group2 | p | test | SMD | |
---|---|---|---|---|---|---|
n | 100 | 100 | ||||
Age (mean (SD)) | 47.54 (18.05) | 64.28 (19.30) | <0.001 | 0.896 | ||
Sex (%) | Male | 76 (76.0) | 57 (57.0) | 0.007 | exact | 0.411 |
Female | 24 (24.0) | 43 (43.0) | ||||
Smokes (%) | FALSE | 86 (86.0) | 79 (79.0) | 0.264 | 0.185 | |
TRUE | 14 (14.0) | 21 (21.0) | ||||
Income (median [IQR]) | 33853.50 [23196.55, 53600.29] | 31957.53 [16484.16, 53772.67] | 0.320 | nonnorm | 0.046 |