WGCNA分析

技能树直播课程学习-WGCNA-1-数据清洗

2020-05-07  本文已影响0人  大吉岭猹

1. 层次聚类

#接下来,我们对样本进行聚类(与稍后对基因进行聚类形成对比),以查看是否有明显的异常值
sampleTree = hclust(dist(datExpr0), method = "average");
sizeGrWindow(12,9)
par(cex = 0.6);
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
     cex.axis = 1.5, cex.main = 2)
     # +abline(h =75 , col = "red")
dev.off()

2. 剔除离群样本

# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 80, minSize = 10)
table(clust)
# clust == 1 包含了我们需要的样本
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
# datExpr 现在包含可用于网络分析的表达式数据。

3. 合并临床信息画图

# 读取临床文件
datTraits=read.table("datTraits.txt",sep = "\t",header = T,check.names = F)
datTraits[1:4,1:4]

# 下面主要是为了防止临床表型与样本名字对不上
datTraits <- datTraits[match(rownames(datExpr),rownames(datTraits)),]
identical(rownames(datTraits),rownames(datExpr))

# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
# 将样本用颜色表示,白色表示低值,红色表示高值,灰色表示缺少条目
# 如果是连续性变量会是渐变色,如果是 0/1 的数据将会是红白相间
traitColors = numbers2colors(datTraits, signed = FALSE);
# Plot the sample dendrogram and the colors underneath.
sizeGrWindow(12,9)
par(cex = 0.6);
par(mar = c(0,4,2,0))
png("Step1-Sample dendrogram and trait heatmap.png",width = 800,height = 600)
plotDendroAndColors(sampleTree2, traitColors,
                    groupLabels = names(datTraits),
                    main = "Sample dendrogram and trait heatmap",)
dev.off()

# 最后表达矩阵要转化为 data.frame 格式,方便下一步操作
datExpr=as.data.frame(datExpr)
save(datExpr, datTraits, file = "WGCNA-01-dataInput.RData")

友情宣传

上一篇 下一篇

猜你喜欢

热点阅读