无生物重复样本的差异分析

2020-05-13 本文已影响0人找兔子的小萝卜

目前由于测序的价格，还有样本自身的珍贵稀缺性，部分实验设计仍然是没有生物学重复的。无重复的数据做差异分析是一件很麻烦的事，可靠性都不能保证。
对于无重复样本的差异分析有几种方法可以选择，如edgeR，DEGseq和GFOLD等。下面分别尝试edgeR做无生物重复性样本的差异分析。

edgeR做无重复样本的差异分析

edgeR针对无重复样本给出了四条建议，
第一条建议是仅分析MDS plot和fold changes，不做显著性分析；
第二条建议是设置合适的离散度值，然后做个exactTest 或glmFit；
第三条看不懂；
第四条建议是基于大量的稳定的参照转录本。

1 下载安装edgeR包

#source("http://bioconductor.org/biocLite.R")
#biocLite("edgeR")
library("edgeR")
library('ggplot2')

2 读取数据

setwd("G:/My_exercise/edgeR")
rawdata <- read.table("hisat_matrix.out",header=TRUE,row.names=1,check.names = FALSE)
head(rawdata)

3 重命名列名

names(rawdata) <- c("F.1yr.OC.count","M.1yr.OC.count")

4 进行分组

group <- factor(c("F.1yr.OC.count","M.1yr.OC.count"))

5 过滤与标准化

y <- DGEList(counts=rawdata,genes=rownames(rawdata),group = group)

6 TMM标准化

y<-calcNormFactors(y)
y$samples

7 推测离散度

根据经验设置，若样本是人，设置bcv = 0.4，模式生物设置0.1.（这里没有经验，我就多试几个）

#bcv <- 0.1
bcv <- 0.2
#bcv <- 0.4
et <- exactTest(y, dispersion=bcv^2)
topTags(et)
summary(de <- decideTestsDGE(et))

8 图形展示检验结果

png('F_1yr.OC-vs-M_yrM.OC.png')
detags <- rownames(y)[as.logical(de)];
plotSmear(et, de.tags=detags)
abline(h=c(-4, 4), col="blue");
dev.off()

9 导出数据

DE <- et$table
DE$significant <- as.factor(DE$PValue<0.05 & abs(DE$logFC) >1)
write.table(DE,file="edgeR_all2",sep="\t",na="NA",quote=FALSE)
write.csv(DE, "edgeR.F-vs-M.OC2.csv")

#DE2 <- topTags(et,20000)$table
#DE2$significant <- as.factor(DE2$PValue<0.05 & DE2$FDR<0.05 & abs(DE2$logFC) >1)
#write.csv(DE2, "F_1yr.OC-vs-M_1yr.OC3.csv")

第二种方法：整理好矩阵后直接运行下述代码

design <- model.matrix(~group_list)
y <- DGEList(counts=data1,group=group_list)
y <- calcNormFactors(y)
y <- estimateCommonDisp(y)
y <- estimateTagwiseDisp(y)
et <- exactTest(y,pair = c("control","VD"))
topTags(et)
ordered_tags <- topTags(et, n=100000)

allDiff=ordered_tags$table
allDiff=allDiff[is.na(allDiff$FDR)==FALSE,]
diff=allDiff
newData=y$pseudo.counts

write.table(diff,file="DEG-GSE78760.xls",sep="\t",quote=F)
```