igraph包可视化基因调控网络

2021-05-16 本文已影响0人小贝学生信

参考教程：https://www.biostars.org/p/285296/

整体思路可分为两大步：
（1）根据测序数据，挖掘出与某一特定生物学因素高度相关的一组基因，以及相应的表达信息；
（2）然后计算这些基因间的表达相关性，利用igraph包进行可视化。

Step 1 ：Get the gene info

示例数据：estrogen包提供的测序数据。根据是否添加雌激素，以及作用时间两个因素，共分为四组，每组两次重复。
感兴趣基因：基因的表达水平与是否添加雌激素，以及作用时间两个因素显著相关的Top基因。

1.1 获取表达矩阵

library(estrogen)
library(affy)
datadir <- system.file("extdata", package="estrogen")
dir(datadir)
setwd(datadir)

# Read in phenotype data and the raw CEL files
pd <- read.AnnotatedDataFrame("estrogen.txt", header=TRUE, sep="", row.names=1)
a <- ReadAffy(filenames=rownames(pData(pd)), phenoData=pd, verbose=TRUE)
# Normalise the data
x <- expresso(
  a,
  bgcorrect.method="rma",
  normalize.method="constant",
  pmcorrect.method="pmonly",
  summary.method="avgdiff")
# Remove control probes
controlProbeIdx <- grep("^AFFX", rownames(x))
x <- x[-controlProbeIdx,]
x
head(exprs(x))
pData(x)

1.2 分析与添加雌激素，以及作用时间高度相关的基因

# Identify genes of significant effect
lm.coef <- function(y) lm(y ~ estrogen * time.h)$coefficients
eff <- esApply(x, 1, lm.coef)
effectUp <- names(sort(eff[2,], decreasing=TRUE)[1:25])
effectDown <- names(sort(eff[2,], decreasing=FALSE)[1:25])
main.effects <- c(effectUp, effectDown)
str(main.effects)
#chr [1:50] "36785_at" "39755_at" "151_s_at" "32174_at" "39781_at" "1985_s_at" "1840_g_at" ...

# Filter the expression set object to include only genes of significant effect
estrogenMainEffects <- exprs(x)[main.effects,]
dim(estrogenMainEffects)
#[1] 50  8

Step 2 ：visualize these gene regulatory network

关于igraph包，在之前有简单介绍，可参看笔记利用igraph包可视化基于KNN的单细胞聚类关系 - 简书 (jianshu.com)

2.1 计算两两基因间的关系/联系

#基于相关性
gene_cor=as.matrix(as.dist(cor(t(estrogenMainEffects), method="pearson")))
dim(gene_cor)
#50 50

#基于欧几里得距离
#gene_dist=as.matrix(dist(estrogenMainEffects, method="euclidean"))

2.2 创建igraph对象

library(igraph)
# Create a graph adjacency based on correlation distances between genes in  pairwise fashion.
g <- graph.adjacency(
  gene_cor,
  mode="undirected",
  weighted=TRUE,
  diag=FALSE
)

# Simplfy the adjacency object
g <- simplify(g, remove.multiple=TRUE, remove.loops=TRUE)

2.3 修饰igraph对象

根据基因间相关性的正负性，设置边的颜色

# Colour negative correlation edges as blue
E(g)[which(E(g)$weight<0)]$color <- "darkblue"

# Colour positive correlation edges as red
E(g)[which(E(g)$weight>0)]$color <- "darkred"

剔除相关性小于0.8的边

# Convert edge weights to absolute values
E(g)$weight <- abs(E(g)$weight)

# Remove edges below absolute Pearson correlation 0.8
g <- delete_edges(g, E(g)[which(E(g)$weight<0.8)])

# Remove any vertices remaining that have no edges
g <- delete.vertices(g, degree(g)==0)

# Assign names to the graph vertices (optional)
V(g)$name <- V(g)$name

设置节点的形状、颜色

# Change shape of graph vertices
V(g)$shape <- "sphere"

# Change colour of graph vertices
V(g)$color <- "skyblue"

# Change colour of vertex frames
V(g)$vertex.frame.color <- "white"

分别设置节点、边的大小。前者与基因表达量相关、后者则取决于相关性

# Scale the size of the vertices to be proportional to the level of expression of each gene represented by each vertex
# Multiply scaled vales by a factor of 10
scale01 <- function(x){(x-min(x))/(max(x)-min(x))}
vSizes <- (scale01(apply(estrogenMainEffects, 1, mean)) + 1.0) * 5

# Amplify or decrease the width of the edges
edgeweights <- E(g)$weight * 2.0

计算得到最小生成树（MST，Minimum spanning tree）；
MST相关概念可参考：https://blog.csdn.net/luoshixian099/article/details/51908175。最直接的特征就是：边edge数量比vertex节点数量少1

# Convert the graph adjacency object into a minimum spanning tree based on Prim's algorithm
mst <- mst(g, algorithm="prim")

2.3 可视化

直接可视化

set.seed(1)
plot(
  mst,
  layout=layout.graphopt,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My first graph")

其它可选的layout排版方式有：
layout.fruchterman.reingold
layout.kamada.kawai
layout.lgl
layout.circle
layout.graphopt
设置module标注

mst.communities <- edge.betweenness.community(mst, weights=NULL, directed=FALSE)
mst.clustering <- make_clusters(mst, membership=mst.communities$membership)
V(mst)$color <- mst.communities$membership + 1

set.seed(1)
plot(
  mst,
  layout=layout.fruchterman.reingold,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My second graph")

set.seed(1)
plot(
  mst.clustering, mst,
  layout=layout.fruchterman.reingold,
  edge.curved=TRUE,
  vertex.size=vSizes,
  vertex.label.dist=-0.5,
  vertex.label.color="black",
  asp=FALSE,
  vertex.label.cex=0.6,
  edge.width=edgeweights,
  edge.arrow.mode=0,
  main="My second graph")

如上进行module的划分或者说是聚类时使用igraph包的edge.betweenness.community。igraph包还提供了其它多种划分算法可供使用，详见https://igraph.org/c/doc/igraph-Community.html
此外对于 case-control study, a useful approach is to generate separate networks for cases and controls and then compare the genes in these based on these metrics.即比较实验组与对照组的调控网络的差别，相关代码在教程最后有简单提到，就不赘述了。

igraph包可视化基因调控网络

Step 1 ：Get the gene info

1.1 获取表达矩阵

1.2 分析与添加雌激素，以及作用时间高度相关的基因

Step 2 ：visualize these gene regulatory network

2.1 计算两两基因间的关系/联系

2.2 创建igraph对象

2.3 修饰igraph对象

2.3 可视化

猜你喜欢

热点阅读