WGCNA文档使用教程

2021-09-28 本文已影响0人 Hello育种

image.png

本文是对WGCNA英文说明文档的翻译，英文的请查看这里

0 下载WGCNA

install.packages("BiocManager")
BiocManager::install("WGCNA")

给出总的分析路线图：

image.png

1 数据输入和清洗

输入数据

# 加载 WGCNA package
library(WGCNA);
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE);
#Read in the female liver data set
femData = read.csv("LiverFemale3600.csv");
# Take a quick look at what is in the data set:
dim(femData);
names(femData);
datExpr0 = as.data.frame(t(femData[, -c(1:8)]));
names(datExpr0) = femData$substanceBXH;
rownames(datExpr0) = names(femData)[-c(1:8)];

数据缺失值查看

#We first check for genes and samples with too many missing values:
gsg = goodSamplesGenes(datExpr0, verbose = 3);
gsg$allOK # if this reture TRUE, 所有基因已经通过，否则需要去除有问题的基因和样本

# 去除有问题的基因和样本
if (!gsg$allOK)
{
# Optionally, print the gene and sample names that were removed:
      if (sum(!gsg$goodGenes)>0)
       printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));
        if (sum(!gsg$goodSamples)>0)
      printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));
# Remove the offending genes and samples from the data:
         datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}

#对样本进行聚类
sampleTree = hclust(dist(datExpr0), method = "average");
# Plot the sample tree: Open a graphic output window of size 12 by 9 inches
# The user should change the dimensions if the window is too large or too small.
sizeGrWindow(12,9)
#pdf(file = "Plots/sampleClustering.pdf", width = 12, height = 9);
par(cex = 0.6);
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,
cex.axis = 1.5, cex.main = 2)

# 去除可能的异常值
# Plot a line to show the cut
abline(h = 15, col = "red");
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 10)
table(clust)
# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]  # 清洗以后的数据
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)

image.png

读入clinical性状数据

traitData = read.csv("ClinicalTraits.csv");
dim(traitData)
names(traitData)
# remove columns that hold information we do not need.
allTraits = traitData[, -c(31, 16)];
allTraits = allTraits[, c(2, 11:36) ];
dim(allTraits)
names(allTraits)
# Form a data frame analogous to expression data that will hold the clinical traits.
femaleSamples = rownames(datExpr);
traitRows = match(femaleSamples, allTraits$Mice);
datTraits = allTraits[traitRows, -1]; # 可以使用的表型数据
rownames(datTraits) = allTraits[traitRows, 1];
collectGarbage();

将样本的聚类与表型数据一起展示

# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
# Convert traits to a color representation: white means low, red means high, grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE);
# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits),
main = "Sample dendrogram and trait heatmap")

save(datExpr, datTraits, file = "FemaleLiver-01-dataInput.RData")# 保存数据

image.png

2 基因网络构建

其包给出三种方式：
a 一步自动网络构建和模块检测（简单完成）
b step by step完成（适用自己设置）
c 一种自动分块网络构建和模块检测方法，适用于希望分析太大而无法集中分析的数据集的用户。

a 一步自动网络构建和模型检测

a.1 选择soft-threadsholding power（β）：分析网络拓扑

构建加权基因网络需要选择软阈值（β），将共表达相似性提高到该能力以计算邻接。该包采用：the criterion of approximate scale-free topology法估计，使用函数pickSoftThreshold()进行网络拓扑分析和β选择。

# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
sft$powerEstimate# 可以选出最佳β
# Plot the results:
sizeGrWindow(9, 5)
par(mfrow = c(1,2));
cex1 = 0.9;
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
labels=powers,cex=cex1,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

image.png

a.2 一步建网和模块检测

使用blockwiseModules()，其中有非常多的参数，可以自己设定。同时对于较大的数据（大于5000probes数），对于maxBlockSize需要设定。同时，运行内容：1个4GB可以做大约9000个probes的。如果数据过大，可以使用分模块进行分析，具体见2.c。

# 分析
net = blockwiseModules(datExpr, power = 6,# 使用上述计算的
                TOMType = "unsigned", minModuleSize = 30,#最小module数量
                 reassignThreshold = 0, mergeCutHeight = 0.25,
                 numericLabels = TRUE, pamRespectsDendro = FALSE,
                 saveTOMs = TRUE,
                saveTOMFileBase= "femaleMouseTOM",#保存的名字，需自己修改
                verbose = 3)

返回结果

table(net$colors)# 查看几个模块，和每个对应的基因数，注意0表示不属于任何模块

以树形图加颜色展示

# open a graphics window
sizeGrWindow(12, 9)
# Convert labels to colors for plotting
mergedColors = labels2colors(net$colors)
# Plot the dendrogram and the module colors underneath
plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]],
"Module colors",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)

image.png

如果想要更改某些树切割、模块成员资格和模块合并标准，该包提供了功能 recutBlockwiseTrees，可以应用修改后的标准，而无需重新计算网络和聚类树状图。

moduleLabels = net$colors
moduleColors = labels2colors(net$colors)
MEs = net$MEs;
geneTree = net$dendrograms[[1]];
save(MEs, moduleLabels, moduleColors, geneTree,
file = "FemaleLiver-02-networkConstruction-auto.RData") # 保存数据，可以不使用

2.b 一步一步构建

# 加载数据
lnames = load(file = "FemaleLiver-01-dataInput.RData");

2.b.1: 同2.a一样，首先选择β，代码也一样。

2.b.2 共表达相似性和邻接

softPower = 6;
adjacency = adjacency(datExpr, power = softPower);

2.b.3 TOM 计算

为减小noise和spurious相关，将adjacency转为TOM

# Turn adjacency into topological overlap
TOM = TOMsimilarity(adjacency);
dissTOM = 1-TOM

2.b.4 使用TOM聚类

对基因聚类，使用hclust（）

# Call the hierarchical clustering function
geneTree = hclust(as.dist(dissTOM), method = "average");
# Plot the resulting clustering tree (dendrogram)
sizeGrWindow(12,9)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
labels = FALSE, hang = 0.04);

image.png

在聚类树（树状图）中，每片叶子，即一条短垂直线，对应一个基因。树状图组的分支紧密相连，高度共表达的基因。模块识别相当于识别单个分支（“切割树状图的分支”）。有几种切枝方法；我们的标准方法是来自包 dynamicTreeCut。下一段代码说明了它的用法。

# We like large modules, so we set the minimum module size relatively high:
minModuleSize = 30;
# Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
            deepSplit = 2, pamRespectsDendro = FALSE,
              minClusterSize = minModuleSize);
table(dynamicMods)

图示：

# Convert numeric lables into colors
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
# Plot the dendrogram and colors underneath
sizeGrWindow(8,6)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05,
main = "Gene dendrogram and module colors")

image.png

2.b.5 根据基因共表达将modules合并

计算eigengenes

Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
# Calculate dissimilarity of module eigengenes
MEDiss = 1-cor(MEs);
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average");
# Plot the result
sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes",
xlab = "", sub = "")

image.png

选择height cut，这里选了cut=0.25，对应的相关为0.75

MEDissThres = 0.25
# Plot the cut line into the dendrogram
abline(h=MEDissThres, col = "red")
# Call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
# The merged module colors
mergedColors = merge$colors;
# Eigengenes of the new merged modules:
mergedMEs = merge$newMEs;

再次图示原来和合并后的module图

sizeGrWindow(12, 9)
#pdf(file = "Plots/geneDendro-3.pdf", wi = 9, he = 6)
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
c("Dynamic Tree Cut", "Merged dynamic"),
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
#dev.off()

image.png

保存数据，用于后续分析

# Rename to moduleColors
moduleColors = mergedColors
# Construct numerical labels corresponding to the colors
colorOrder = c("grey", standardColors(50));
moduleLabels = match(moduleColors, colorOrder)-1;
MEs = mergedMEs;
# Save module colors and labels for use in subsequent parts
save(MEs, moduleLabels, moduleColors, geneTree, file = "FemaleLiver-02-networkConstruction-stepByStep.RData")

2.c 对于大数据，分块进行网络构建和模块分析

2.c.1 同2.a.1相同，需要估计β值，代码同上

2.c.2

将大数据分为几块进行，每次只使用最多2000genes分析.
基本思想是使用两级聚类。首先，我们使用快速、计算成本低且相对粗糙的聚类方法将基因预聚类成大小接近但不超过 2000 个基因的最大值的块。然后我们分别在每个块中执行完整的网络分析。最后，合并特征基因高度相关的模块。逐块方法的优点是内存占用小得多（这是标准台式计算机上大型数据集的主要问题），并且显着加快了计算速度。缺点是，由于使用更简单的聚类来获得块，块可能不是最佳的，导致一些外围基因被分配到不同的模块，而不是在完整的网络分析中。
使用blockwiseModules完成：

bwnet = blockwiseModules(datExpr, maxBlockSize = 2000,
          power = 6, TOMType = "unsigned", minModuleSize = 30,
       reassignThreshold = 0, mergeCutHeight = 0.25,
          numericLabels = TRUE,
         saveTOMs = TRUE,
           saveTOMFileBase = "femaleMouseTOM-blockwise",
verbose = 3)

注意：
（1）The parameter mergeCutHeight is the threshold for merging of modules.
（2）bwnet $colors contains the module assignment, and bwnet$ MEs contains the module eigengenes of the modules.

与2.a结果比较,将结果的labe重命名

# Load the results of single-block analysis
load(file = "FemaleLiver-02-networkConstruction-auto.RData");
# Relabel blockwise modules
bwLabels = matchLabels(bwnet$colors, moduleLabels);
# Convert labels to colors for plotting
bwModuleColors = labels2colors(bwLabels)

查看结果：

table(bwLabels)

分块图示：

# open a graphics window
sizeGrWindow(6,6)
# Plot the dendrogram and the module colors underneath for block 1
plotDendroAndColors(bwnet$dendrograms[[1]], bwModuleColors[bwnet$blockGenes[[1]]],
"Module colors", main = "Gene dendrogram and module colors in block 1",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
# Plot the dendrogram and the module colors underneath for block 2
plotDendroAndColors(bwnet$dendrograms[[2]], bwModuleColors[bwnet$blockGenes[[2]]],
"Module colors", main = "Gene dendrogram and module colors in block 2",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)

2.c.3 分块与单个模块分析比较

图示，颜色比较

sizeGrWindow(12,9)
plotDendroAndColors(geneTree,
             cbind(moduleColors, bwModuleColors),
             c("Single block", "2 blocks"),
            main = "Single block gene dendrogram and module colors",
            dendroLabels = FALSE, hang = 0.03,
            addGuide = TRUE, guideHang = 0.05)

对于分块计算和单块计算的每个module的eigengenes进行计算和比较

singleBlockMEs = moduleEigengenes(datExpr, moduleColors)$eigengenes;
blockwiseMEs = moduleEigengenes(datExpr, bwModuleColors)$eigengenes;

比对结果比率

single2blockwise = match(names(singleBlockMEs), names(blockwiseMEs))
signif(diag(cor(blockwiseMEs[, single2blockwise], singleBlockMEs)), 3)

image.png

3 将module与外部信息相关联并识别重要基因

加载由2产生的结果：

# Load network data saved in the second part.
lnames = load(file = "FemaleLiver-02-networkConstruction-auto.RData");
lnames

3.a 关联module与额外的临床表型

在此分析中，我们希望确定与测量的临床特征显着相关的模块。由于我们已经有了每个模块的概要文件（特征基因eigengene），我们只需将特征基因与外部特征相关联并寻找最重要的关联：

# Define numbers of genes and samples
nGenes = ncol(datExpr);
nSamples = nrow(datExpr);
# Recalculate MEs with color labels
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);

因为模块与表型都很多，采用图示更加清晰

sizeGrWindow(10,6)
# Will display correlations and their p-values
textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3));
# Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = moduleTraitCor,
               xLabels = names(datTraits),
               yLabels = names(MEs),
              ySymbols = names(MEs),
               colorLabels = FALSE,
               colors = greenWhiteRed(50),
               textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))

image.png

分析确定了几个重要的模块{特质关联。我们将专注于体重作为感兴趣的特征。

3.b 基因与性状和重要模块的关系：显著基因和模块成员

我们通过将基因显着性（Gene significance ,GS）定义为基因与性状之间的相关性（绝对值）来量化单个基因与我们感兴趣的性状（体重）的关联。
对于每个模块，我们还定义了模块成员 MM 的定量度量，作为模块特征基因（eigengene）和基因表达谱的相关性。这使我们能够量化阵列上所有基因与每个模块的相似性。

# Define variable weight containing the weight column of datTrait
weight = as.data.frame(datTraits$weight_g);
names(weight) = "weight"
## 计算MM
# names (colors) of the modules
modNames = substring(names(MEs), 3)
geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));
names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");
## 计算GS
geneTraitSignificance = as.data.frame(cor(datExpr, weight, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));
names(geneTraitSignificance) = paste("GS.", names(weight), sep="");
names(GSPvalue) = paste("p.GS.", names(weight), sep="");

3.c 模块内分析：识别具有高 GS 和 MM 的基因

这里我们将查看brown module的基因，画出GS VS MM的图：

module = "brown"
column = match(module, modNames);
moduleGenes = moduleColors==module;
sizeGrWindow(7, 7);
par(mfrow = c(1,1));
verboseScatterplot(abs(geneModuleMembership[moduleGenes, column]),
                               abs(geneTraitSignificance[moduleGenes, 1]),
                              xlab = paste("Module Membership in", module, "module"),
                              ylab = "Gene significance for body weight",
                              main = paste("Module membership vs. gene significance\n"),
                              cex.main = 1.2, cex.lab = 1.2, cex.axis = 1.2, col = module)

image.png

该图如上图所示。显然，GS 和 MM 高度相关，说明与性状高度显着相关的基因通常也是与性状相关的模块中最重要（中心）的元素。

3.d 汇总网络分析结果，并输出

names(datExpr) #给出所有的probe ID
names(datExpr)[moduleColors=="brown"] # 给出属于brown模块的probe ID

## 为了便于解释结果，我们使用表达阵列制造商提供的探针注释文件将探针 ID（probe ID ） 连接到基因名称和## 普遍认可的识别号（Entrez 代码）。
annot = read.csv(file = "GeneAnnotation.csv");
dim(annot)
names(annot)
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
# The following is the number or probes without annotation:
sum(is.na(probes2annot))
# Should return 0.

我们现在创建一个数据框，其中包含所有探针的以下信息：探针 (probes) ID、基因符号、基因座链接 ID（Entrez 代码）、模块颜色、权重的基因显着性以及所有模块中的模块成员资格和 p 值。模块将按重量的显着性排序，最显着的在左侧。

# Create the starting data frame
geneInfo0 = data.frame(substanceBXH = probes,
                                      geneSymbol = annot$gene_symbol[probes2annot],
                                     LocusLinkID = annot$LocusLinkID[probes2annot],
                                     moduleColor = moduleColors,
                                     geneTraitSignificance,
                                     GSPvalue)
# Order modules by their significance for weight
modOrder = order(-abs(cor(MEs, weight, use = "p")));
# Add module membership information in the chosen order
for (mod in 1:ncol(geneModuleMembership))
{
       oldNames = names(geneInfo0)
       geneInfo0 = data.frame(geneInfo0, geneModuleMembership[, modOrder[mod]],
                                     MMPvalue[, modOrder[mod]]);
       names(geneInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep=""),
                                     paste("p.MM.", modNames[modOrder[mod]], sep=""))
}
# Order the genes in the geneInfo variable first by module color, then by geneTraitSignificance
geneOrder = order(geneInfo0$moduleColor, -abs(geneInfo0$GS.weight));
geneInfo = geneInfo0[geneOrder, ]

输出结果

write.csv(geneInfo, file = "geneInfo.csv")

4 将网络分析与其他数据（如功能注释和基因本体）连接起来

是将感兴模块中的基因取出，进行GO注释，但是目前有较多的在线注释网站，如果不是选择了太多模块，可以直接在在线注释网站上进行。

加载前面的数据

# Load the expression and trait data saved in the first part
lnames = load(file = "FemaleLiver-01-dataInput.RData");
#The variable lnames contains the names of loaded variables.
lnames
# Load network data saved in the second part.
lnames = l

读入注释信息并且关联

# Read in the probe annotation
annot = read.csv(file = "GeneAnnotation.csv");
# Match probes in the data set to the probe IDs in the annotation file 
probes = names(datExpr)
probes2annot = match(probes, annot$substanceBXH)
# Get the corresponding Locuis Link IDs
allLLIDs = annot$LocusLinkID[probes2annot];
# $ Choose interesting modules
intModules = c("brown", "red", "salmon")
for (module in intModules)
{
  # Select module probes
  modGenes = (moduleColors==module)
  # Get their entrez ID codes
  modLLIDs = allLLIDs[modGenes];
  # Write them into a file
  fileName = paste("LocusLinkIDs-", module, ".txt", sep="");
  write.table(as.data.frame(modLLIDs), file = fileName,
              row.names = FALSE, col.names = FALSE)
}
# As background in the enrichment analysis, we will use all probes in the analysis.
fileName = paste("LocusLinkIDs-all.txt", sep="");
write.table(as.data.frame(allLLIDs), file = fileName,
            row.names = FALSE, col.names = FALSE)

https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/FemaleLiver-04-Interfacing.pdf

5 使用WGCNA功能对于network分析可视化

英文文档
首先同样加载数据

5.a 可视化基因网络

# Calculate topological overlap anew: this could be done more efficiently by saving the TOM
# calculated during module detection, but let us do it again here.
dissTOM = 1-TOMsimilarityFromExpr(datExpr, power = 6);
# Transform dissTOM with a power to make moderately strong connections more visible in the heatmap
plotTOM = dissTOM^7;
# Set diagonal to NA for a nicer plot
diag(plotTOM) = NA;
# Call the plot function
sizeGrWindow(9,9)
TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes")

image.png

但是有时基因太多，需要限制数量，这里选400个

nSelect = 400
# For reproducibility, we set the random seed
set.seed(10);
select = sample(nGenes, size = nSelect);
selectTOM = dissTOM[select, select];
# There’s no simple way of restricting a clustering tree to a subset of genes, so we must re-cluster.
selectTree = hclust(as.dist(selectTOM), method = "average")
selectColors = moduleColors[select];
# Open a graphical window
sizeGrWindow(9,9)
# Taking the dissimilarity to a power, say 10, makes the plot more informative by effectively changing
# the color palette; setting the diagonal to NA also improves the clarity of the plot
plotDiss = selectTOM^7;
diag(plotDiss) = NA;
TOMplot(plotDiss, selectTree, selectColors, main = "Network heatmap plot, selected genes")

图类似上面的。

5.b 可视化eigengenes的网络图

研究发现的模块之间的关系通常很有趣。可以使用特征基因作为代表性配置文件，并通过特征基因相关性量化模块相似性。该包包含一个方便的函数 plotEigengeneNetworks，可生成特征基因网络的汇总图。将临床特征（或多个特征）添加到特征基因以查看特征如何适应特征基因网络通常是有用的：

# Recalculate module eigengenes
MEs = moduleEigengenes(datExpr, moduleColors)$eigengenes
# Isolate weight from the clinical traits
weight = as.data.frame(datTraits$weight_g);
names(weight) = "weight"
# Add the weight to existing module eigengenes
MET = orderMEs(cbind(MEs, weight))
# Plot the relationships among the eigengenes and the trait
sizeGrWindow(5,7.5);
par(cex = 0.9)
plotEigengeneNetworks(MET, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2), cex.lab = 0.8, xLabelsAngle
= 90)

该函数生成特征基因和特征的树状图，以及它们之间关系的热图。要拆分树状图和热图图，我们可以使用以下代码

# Plot the dendrogram
sizeGrWindow(6,6);
par(cex = 1.0)
plotEigengeneNetworks(MET, "Eigengene dendrogram", marDendro = c(0,4,2,0),
plotHeatmaps = FALSE)
# Plot the heatmap matrix (note: this plot will overwrite the dendrogram plot)
par(cex = 1.0)
plotEigengeneNetworks(MET, "Eigengene adjacency heatmap", marHeatmap = c(3,4,2,2),
plotDendrograms = FALSE, xLabelsAngle = 90)

image.png

图 2 显示了上述代码的输出。特征基因树状图和热图识别称为meta-modules的相关特征基因组。例如，树状图表明 red、 brown 和 bluw 模块高度相关；它们的相互相关性强于与体重的相关性。另一方面，与重量显着相关的meta-模块与红色、棕色和蓝色模块不是同一元模块的一部分，至少如果元模块被定义为模块的紧密客户（对于例如，特征基因相关性至少为 0.5 的模块）。

6 将网络导出到外部软件

英文文档
首先导入数据，和前面的代码一样

# Load the expression and trait data saved in the first part
lnames = load(file = "FemaleLiver-01-dataInput.RData");
#The variable lnames contains the names of loaded variables.
lnames
# Load network data saved in the second part.
lnames = load(file = "FemaleLiver-02-networkConstruction-auto.RData");
lnames

6.a VisANT软件可视化

需要在R中导出其需要的数据

# Recalculate topological overlap
TOM = TOMsimilarityFromExpr(datExpr, power = 6);
# Read in the annotation file
annot = read.csv(file = "GeneAnnotation.csv");
# Select module
module = "brown";
# Select module probes
probes = names(datExpr)
inModule = (moduleColors==module);
modProbes = probes[inModule];
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule];
dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into an edge list file VisANT can read
vis = exportNetworkToVisANT(modTOM,
file = paste("VisANTInput-", module, ".txt", sep=""),
weighted = TRUE,
threshold = 0,
probeToGene = data.frame(annot$substanceBXH, annot$gene_symbol) )

因为 brown 模块比较大，我们可以将输出中的基因限制为模块中的 30 个 top hub 基因：

nTop = 30;
IMConn = softConnectivity(datExpr[, modProbes]);
top = (rank(-IMConn) <= nTop)
vis = exportNetworkToVisANT(modTOM[top, top],
file = paste("VisANTInput-", module, "-top30.txt", sep=""),
weighted = TRUE,
threshold = 0,
probeToGene = data.frame(annot$substanceBXH, annot$gene_symbol) )

最后将导出的数据读入VisANT软件，结果如下：

image.png

6.b Cytoscape软件可视化

同样需要导出合适的数据，这里导出red和brown两个模块

# Recalculate topological overlap if needed
TOM = TOMsimilarityFromExpr(datExpr, power = 6);
# Read in the annotation file
annot = read.csv(file = "GeneAnnotation.csv");
# Select modules
modules = c("brown", "red");
# Select module probes
probes = names(datExpr)
inModule = is.finite(match(moduleColors, modules));
modProbes = probes[inModule];
modGenes = annot$gene_symbol[match(modProbes, annot$substanceBXH)];
# Select the corresponding Topological Overlap
modTOM = TOM[inModule, inModule];

dimnames(modTOM) = list(modProbes, modProbes)
# Export the network into edge and node list files Cytoscape can read
cyt = exportNetworkToCytoscape(modTOM,
edgeFile = paste("CytoscapeInput-edges-", paste(modules, collapse="-"), ".txt", sep=""),
nodeFile = paste("CytoscapeInput-nodes-", paste(modules, collapse="-"), ".txt", sep=""),
weighted = TRUE,
threshold = 0.02,
nodeNames = modProbes,
altNodeNames = modGenes,
nodeAttr = moduleColors[inModule]);

请注意，Cytoscape 的网络输入涉及更多，用户应注意选择要正确解释的边缘和节点文件的所有必要选项。我们向读者推荐 Cytoscape 文档以了解所有必要的细节。

WGCNA文档使用教程

0 下载WGCNA

1 数据输入和清洗

2 基因网络构建

a 一步自动网络构建和模型检测

a.1 选择soft-threadsholding power（β）：分析网络拓扑

a.2 一步建网和模块检测

2.b 一步一步构建

2.b.1: 同2.a一样，首先选择β，代码也一样。

2.b.2 共表达相似性和邻接

2.b.3 TOM 计算

2.b.4 使用TOM聚类

2.b.5 根据基因共表达将modules合并

2.c 对于大数据，分块进行网络构建和模块分析

2.c.1 同2.a.1相同，需要估计β值，代码同上

2.c.2

2.c.3 分块与单个模块分析比较

3 将module与外部信息相关联并识别重要基因

3.a 关联module与额外的临床表型

3.b 基因与性状和重要模块的关系：显著基因和模块成员

3.c 模块内分析：识别具有高 GS 和 MM 的基因

3.d 汇总网络分析结果，并输出

4 将网络分析与其他数据（如功能注释和基因本体）连接起来

加载前面的数据

读入注释信息并且关联

5 使用WGCNA功能对于network分析可视化

5.a 可视化基因网络

5.b 可视化eigengenes的网络图

6 将网络导出到外部软件

6.a VisANT软件可视化

6.b Cytoscape软件可视化

猜你喜欢

热点阅读