一个数据里两种分组信息怎么做差异分析

2024-07-26  本文已影响0人  小洁忘了怎么分身

1.背景知识

众所周知,limma可以做基因表达芯片和转录组数据的差异分析,可以方便的得处两组之间有表达量差别的基因。多分组差异分析其实也是两两差异分析的批量做法。

有的实验设计相对复杂一点,比如今天使用的例子:

image.png

实验设计包括了两种分组,一个是siC和sip53,一个是NT与TNF。

那么常规的两组之间的差异分析也就不能同时分析这两种实验设计啦!

这时候我们就需要用到双因素差异分析。

1.整理表达矩阵

rm(list = ls())
library(tinyarray)
library(tidyverse)
library(limma)
gse = "GSE53153"
a = geo_download(gse,destdir=tempdir(),colon_remove = T)
find_anno(a$gpl)
library(illuminaHumanv4.db);ids <- toTable(illuminaHumanv4SYMBOL)
eset = trans_array(log2(a$exp+1),ids)
eset[1:4,1:4]
##          GSM1283060 GSM1283061 GSM1283062 GSM1283063
## EEF1A1    16.125688  16.097172  16.035905  15.996306
## GAPDH     15.125034  15.114759  15.188604  15.102693
## SLC35E2A   7.980711   8.234099   7.651769   8.134426
## CLXN       7.634448   7.811856   7.692790   7.792465

2.整理实验设计

#双因素分析
targets = data.frame(mutate = rep(c("siC","sip53"),each = 6),
                     induce = rep(c("untreat", "treat"), each = 3, times = 2))
targets
##    mutate  induce
## 1     siC untreat
## 2     siC untreat
## 3     siC untreat
## 4     siC   treat
## 5     siC   treat
## 6     siC   treat
## 7   sip53 untreat
## 8   sip53 untreat
## 9   sip53 untreat
## 10  sip53   treat
## 11  sip53   treat
## 12  sip53   treat

把这两种分组合并到一起

TS <- paste(targets$mutate, targets$induce, sep=".")
TS <- factor(TS, levels=c("siC.untreat","siC.treat","sip53.untreat","sip53.treat"))
TS
##  [1] siC.untreat   siC.untreat   siC.untreat   siC.treat     siC.treat    
##  [6] siC.treat     sip53.untreat sip53.untreat sip53.untreat sip53.treat  
## [11] sip53.treat   sip53.treat  
## Levels: siC.untreat siC.treat sip53.untreat sip53.treat

3.做差异分析

这里同时做了三次差异分析:

哪些基因对siC组的treat有反应,即siC组的treat-untreat

哪些基因对sip53组的treat有反应,即sip53组的treat-untreat

与siC组相比,哪些基因在sip53组中的反应不同,即上面两组差异分析所得的logFC再比较!

前两次差异分析其实相当于取子集+二分组差异分析。第三次是双因素差异分析咯。

design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)
fit <- lmFit(eset, design)
cont.matrix <- makeContrasts(
  w = siC.treat-siC.untreat,
  m = sip53.treat-sip53.untreat,
  diff = (sip53.treat-sip53.untreat)-(siC.treat-siC.untreat),
  levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)

deg1 = topTable(fit2,number = Inf,coef = 1)
deg2 = topTable(fit2,number = Inf,coef = 2)
deg3 = topTable(fit2,number = Inf,coef = 3)
head(deg3)
##            logFC   AveExpr          t      P.Value    adj.P.Val         B
## MMP9   -1.889600  8.014973 -17.520398 8.191150e-12 1.712278e-07 13.259648
## CXCL10 -1.598374  9.294062 -12.550379 1.166271e-09 1.218986e-05 10.453810
## PI3     1.458585  8.562082  10.029716 2.812942e-08 1.960058e-04  8.248613
## EBI3   -1.382219  8.596838  -9.797690 3.885369e-08 2.030494e-04  8.008927
## LTB    -1.412194  8.331969  -8.950493 1.327325e-07 5.549282e-04  7.072795
## TNIP1  -1.048539 12.778876  -8.828485 1.594793e-07 5.556259e-04  6.929717

4.差异基因

get_diff_gene = function(deg,logFC_t = 1,p_t = 0.05){
  k1 = (deg$P.Value < p_t)&(deg$logFC < -logFC_t)
  k2 = (deg$P.Value < p_t)&(deg$logFC > logFC_t)
  library(dplyr)
  deg = mutate(deg,change = ifelse(k1,"down",ifelse(k2,"up","stable")))
  table(deg$change)
  return(rownames(deg)[deg$change!="stable"])
}

cg1 = get_diff_gene(deg1);length(cg1)
## [1] 121
cg2 = get_diff_gene(deg2);length(cg2)
## [1] 109
cg3 = get_diff_gene(deg3);length(cg3)
## [1] 14
dat = list(siC = cg1,
           sip53 = cg2,
           diff = cg3)
draw_venn(dat,"")
ggsave("v.png")
image.png
dat = data.frame(t(eset[cg3,]),
                 targets,
                 TS,check.names = F)
library(ggplot2)
ps = lapply(cg3, function(g){
  ggplot(dat,aes_string(x = "TS",
                      y =  paste0("`",g,"`"),
                      fill = "induce"))+
  geom_boxplot()+
  theme_bw()+
  theme(legend.position = "top")
})
library(patchwork)
wrap_plots(ps[1:4],ncol = 2)
image.png

5.logFC 是如何计算的

TS
##  [1] siC.untreat   siC.untreat   siC.untreat   siC.treat     siC.treat    
##  [6] siC.treat     sip53.untreat sip53.untreat sip53.untreat sip53.treat  
## [11] sip53.treat   sip53.treat  
## Levels: siC.untreat siC.treat sip53.untreat sip53.treat
x = eset["MMP9",]
(mean(x[10:12])-mean(x[7:9]))-(mean(x[4:6])-mean(x[1:3]))
## [1] -1.8896
deg3$logFC[1]
## [1] -1.8896

所以也就是计算出了两组之间的logFC之差。

代码参考自limma帮助文档第九章。

上一篇 下一篇

猜你喜欢

热点阅读