生物节律之cosinor分析——终极R包circacompare
前面我们已经介绍了cosinor差异分析最常用的两个包:cosinor
和cosinor2
包,这两个包其实已经做得非常好了,就是还有一点点缺陷,下面一起来看看它们的缺陷:
1.cosinor包
示例来自官网 vignettes
library(cosinor)
head(vitamind)
# X Y time
# 1 0 16.12091 11.439525
# 2 0 29.90624 5.807104
# 3 1 39.17572 1.045492
# 4 1 35.15403 4.082983
# 5 1 43.67065 10.606247
# 6 0 31.20360 5.126054
fit <- cosinor.lm(Y ~ time(time) + X + amp.acro(X), data = vitamind, period = 12)
summary(fit)
# Raw model coefficients:
# estimate standard.error lower.CI upper.CI p.value
# (Intercept) 29.6898 0.4654 28.7776 30.6020 0.0000
# X 1.9019 0.8041 0.3258 3.4779 0.0180
# rrr 0.9308 0.6357 -0.3151 2.1767 0.1431
# sss 6.2010 0.6805 4.8673 7.5347 0.0000
# X:rrr 5.5795 1.1386 3.3479 7.8111 0.0000
# X:sss -1.3825 1.1364 -3.6097 0.8447 0.2237
#
# ***********************
#
# Transformed coefficients:
# estimate standard.error lower.CI upper.CI p.value
# (Intercept) 29.6898 0.4654 28.7776 30.6020 0.000
# [X = 1] 1.9019 0.8041 0.3258 3.4779 0.018
# amp 6.2705 0.6799 4.9378 7.6031 0.000
# [X = 1]:amp 8.0995 0.9095 6.3170 9.8820 0.000
# acr 1.4218 0.1015 1.2229 1.6207 0.000
# [X = 1]:acr 0.6372 0.1167 0.4084 0.8659 0.000
test_cosinor(fit, "X", param = "amp")#param 命名要测试参数的字符串,“amp”表示振幅,“acr”表示顶相(峰值相位)
# Global test:
# Statistic:
# [1] 2.59
#
#
# P-value:
# [1] 0.1072
#
# Individual tests:
# Statistic:
# [1] 1.61
#
#
# P-value:
# [1] 0.1072
#
# Estimate and confidence interval[1] "1.83 (-0.4 to 4.05)"
发现没,其实这个包的核心是检测相位和振幅的差异,但是我们就无法推测基线(均值)差异了吗?其实不然,由summary(fit)
我们其实已经得到了均值差异的结果

1
处代表默认状态下的均值水平,2
处代表因子X=1时与默认状态下的均值差异,所以3
处代表的就是均值差异的p值因此,
cosinor
包中的比较是分散的,整理结果时需要花一番手脚
2.cosinor2
#构造测试数据集
set.seed(100)
mat1 <- data.frame(row.names = paste0("subject",1:10),
x1=1+rnorm(10,0,0.15),
x2=2+rnorm(10,0,0.2),
x3=3+rnorm(10,0,0.3),
x4=4+rnorm(10,0,0.4),
x5=3+rnorm(10,0,0.35),
x6=2+rnorm(10,0,0.25))
mat2 <- data.frame(row.names = paste0("subject",1:12),
x1=8+rnorm(12,0,0.85),
x2=6+rnorm(12,0,0.6),
x3=4+rnorm(12,0,0.45),
x4=2+rnorm(12,0,0.25),
x5=4+rnorm(12,0,0.4),
x6=6+rnorm(12,0,0.65))
time=seq(1,21,4)
library(cosinor2)
fit.pa_ext.cosinor <- population.cosinor.lm(data = mat1 , time = time, period = 24)
# MESOR Amplitude Acrophase
# 1 2.520422 1.364075 -3.399919
fit.pa_int.cosinor <- population.cosinor.lm(data = mat2, time = time, period = 24)
# MESOR Amplitude Acrophase
# 1 4.954428 2.617844 -0.2602902
cosinor.poptests(fit.pa_ext.cosinor, fit.pa_int.cosinor)
# F df1 df2 p 1st population 2nd population
# MESOR 921.87276 1 20 3.315962e-18 2.520422 4.9544275
# Amplitude 14.83446 1 20 9.952569e-04 1.364075 2.6178436
# Acrophase 124.81985 1 20 4.755292e-10 -3.399919 -0.2602902
# Warning message:
# In cosinor.poptests(fit.pa_ext.cosinor, fit.pa_int.cosinor) :
# Results of population amplitude difference test are not reliable due to different acrophases.
看到最后三行没,有一个大大的警告:如果两个种群的峰值相位显著不同,则振幅差异检验的结果不可靠,不可靠,不可靠,重要的结果说三遍
因此,这两个包都有一点点缺陷,不影响正常使用,但使用起来总觉得有点不舒服
3.终极R包circacompare
circacompare
is an R package that allows for the statistical analyses and comparison of two circadian rhythms. This work is published here and can be cited as:
Rex Parsons, Richard Parsons, Nicholas Garner, Henrik Oster, Oliver Rawashdeh, CircaCompare: A method to estimate and statistically support differences in mesor, amplitude, and phase, between circadian rhythms, Bioinformatics, https://doi.org/10.1093/bioinformatics/btz730
示例1:检测单个基因是否振荡
library(circacompare)
# create an example data frame with a phase shift between groups of 6 hours.
df <- make_data(hours_diff = 6)
df <- df[df$group == "g1",]
head(df)
# time measure group
# 1 1 0.35 g1
# 2 2 11.00 g1
# 3 3 7.60 g1
# 4 4 11.00 g1
# 5 5 6.90 g1
# 6 6 15.00 g1
out <- circa_single(x = df, col_time = "time", col_outcome = "measure")# call circacompare.
# view the graph.
out[[3]]
# view the results.
out[[2]]
# mesor amplitude amplitude_p phase_radians peak_time_hours
# 1 0.2229167 10.39622 1.266734e-09 1.514854 5.786316
summary(out[[1]])
# Formula: measure ~ k + alpha * cos(time_r - phi)
#
# Parameters:
# Estimate Std. Error t value Pr(>|t|)
# k 0.22292 0.71757 0.311 0.759
# alpha 10.39622 1.01480 10.245 1.27e-09 ***
# phi 1.51485 0.09761 15.519 5.57e-13 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 3.515 on 21 degrees of freedom
#
# Number of iterations to convergence: 5
# Achieved convergence tolerance: 2.926e-07
nlstools::confint2(out[[1]],level = 0.95)
# 2.5 % 97.5 %
# k -1.269350 1.715183
# alpha 8.285839 12.506606
# phi 1.311859 1.717849
解读:out[[3]]
是ggplot2对象,可以直接输出,也可以修改各种基于ggplot2的参数,图形左上角表明基因振荡是否显著

out[[2]]
是拟合主要统计值,有1行5列,分别为均值(基线),振幅,振幅p值(也就是振荡显著性的p值),峰值相位弧度值,峰值相位小时值
out[[1]]
为拟合结果,summary
查看总结输出更详细的统计资料,nlstools::confint2
获得置信区间
示例2 检测单个基因在两组间是否振荡:核心内容
df <- make_data(hours_diff = 6)
table(df$group)# 2 groups
# g1 g2
# 24 24
out <- circacompare(x = df, col_time = "time", col_group = "group", col_outcome = "measure")# call circacompare.
# view the graph.
out[[1]]
# view the results.
out[[2]]$value <- round(out[[2]]$value,4)
out[[2]]
# parameter value
# 1 Both groups were rhythmic 1.0000
# 2 Presence of rhythmicity (p-value) for g1 0.0000
# 3 Presence of rhythmicity (p-value) for g2 0.0000
# 4 g1 mesor estimate 0.2229
# 5 g2 mesor estimate 2.8946
# 6 Mesor difference estimate 2.6717
# 7 P-value for mesor difference 0.0027
# 8 g1 amplitude estimate 10.3962
# 9 g2 amplitude estimate 14.2856
# 10 Amplitude difference estimate 3.8894
# 11 P-value for amplitude difference 0.0021
# 12 g1 peak time 5.7863
# 13 g2 peak time 23.9125
# 14 Phase difference estimate -5.8738
# 15 P-value for difference in phase 0.0000
summary(out[[3]])
nlstools::confint2(out[[3]])
解读:out[[1]]
同样返回的为ggplot2对象,如图:

out[[2]]为拟合主要统计值,有15行2列:

1.两组都振荡的p值
2.g1组振荡显著性的p值
3.g2组振荡显著性的p值
4~7.g1组均值,g2组均值,两组均值差异,均值差异显著性
8-11.g1组振幅,g2组振幅,两组振幅差异,振幅差异显著性
12-15.g1组峰值相位,g2组峰值相位,两组峰值相位差异,峰值相位差异显著性
out[[3]]
为拟合结果,summary
查看总结输出更详细的统计资料,nlstools::confint2
获得置信区间注意:单个时
out[[3]]
为图像,out[[1]]
为拟合结果比较时
out[[1]]
为图像,out[[3]]
为拟合结果
ps:怎么样,是不是心动了,还有比这更好使的做cosinor工具吗?只恨没有早点发现啊,赶紧devtools::install_github("RWParsons/circacompare")
安装起来吧
不喜欢代码的,也可以试试Shiny版本哦,链接在此:https://rwparsons.shinyapps.io/circacompare/
注意:
您上传的数据应为csv格式,同时您上传的文件应包含以下内容的列:
- 1.时间点(应为数字,以小时为单位)
- 2.分组变量(可以是任何格式,但只能有两个可能的值)
- 3.测量值(应为数字)
上传您的csv文件,然后从下拉菜单中选择相应的列,单击“运行”进行比较。
