R 数据可视化 —— circlize 圆形热图
前言
在前一节,我们介绍了如何使用 circos.rect()
函数来绘制圆形热图,在 0.4.10
版本之后,circlize
包提供了 circos.heatmap()
函数,来简化圆形热图的创建和绘制。
下面我们来介绍如何使用该函数来绘制圆形热图,首先,创建数据
mat <- rbind(
cbind(matrix(
rnorm(50*5, mean = 1), nr = 50),
matrix(rnorm(50*5, mean = -1), nr = 50)
),
cbind(matrix(
rnorm(50*5, mean = -1), nr = 50),
matrix(rnorm(50*5, mean = 1), nr = 50)
)
)
rownames(mat) <- paste0("R", 1:100)
colnames(mat) <- paste0("C", 1:10)
mat <- mat[sample(100, 100), ]
split <- sample(letters[1:5], 100, replace = TRUE)
split <- factor(split, levels = letters[1:5])
构造 100*10
的矩阵,并随机分为 5
类。如果是正常聚类热图的话
Heatmap(mat, row_split = split)
圆形热图
1. 输入数据
circos.heatmap()
的输入数据必须为矩阵,如果矩阵存在不同的分组,必须在 split
参数中传递字符向量或因子,如果传递了数值向量,会自动转换为字符
如果输入数据的矩阵是连续型数值,颜色参数 col
必须为 colorRamp2()
生成的颜色映射,如果是字符型,则需要传递定义颜色映射的命名向量
col_fun <- colorRamp2(c(-2, 0, 2), c("#fc8d59", "#ffffbf", "#91bfdb"))
circos.heatmap(mat, split = split, col = col_fun)
circos.clear()
行数据的绘制是沿着圆的方向,列是沿着半径的方向
如果不添加 split
,则默认绘制一个大的扇形区域
circos.heatmap(mat, col = col_fun)
circos.clear()
记住要用 circos.clear()
来删除布局
2. 布局控制
可以在绘制圆形热图之前,使用 circos.par()
函数来控制布局,例如起始的角度、扇形之间的间隔等
circos.par(start.degree = 90, gap.degree = 10)
circos.heatmap(
mat, split = split, col = col_fun,
track.height = 0.4, bg.border = "red",
bg.lwd = 2, bg.lty = 2, show.sector.labels = TRUE
)
circos.clear()
扇形的顺序可以使用 factor
的 levels
参数来设置
circos.heatmap(
mat, split = factor(split, levels = c("e", "d", "c", "b", "a")),
col = col_fun, show.sector.labels = TRUE
)
circos.clear()
我们使用 circos.clear()
清除布局之后,扇形从 0
度开始
3. 聚类
默认情况下,会对数值矩阵按行进行聚类,将 cluster
设置为 FALSE
可以关闭聚类。如果没有执行聚类,尽管设置了 dend.side
参数的值也不会绘制树状图
par(mfcol = c(1, 2))
circos.heatmap(mat, split = split,
cluster = TRUE, col = col_fun)
circos.clear()
text(0, 0, "cluster = TRUE")
circos.heatmap(mat, split = split,
cluster = FALSE, col = col_fun)
circos.clear()
text(0, 0, "cluster = FALSE")
clustering.method
和 distance.method
两个参数用于设置聚类方法(hclust
)和距离度量(dist
)
circos.heatmap()
不能直接对列进行聚类,可以在绘制之前,手动对列进行聚类,然后按聚类的顺序对矩阵的列进行重排
column_od <- hclust(dist(t(mat)))$order
circos.heatmap(mat[, column_od], split = split,
col = col_fun)
circos.clear()
4. 树状图和行名
dend.side
参数可以控制树状图相对于热图的位置,树状图是一个独立的轨迹
par(mfcol = c(1, 2))
circos.heatmap(
mat, split = split, col = col_fun,
dend.side = "inside"
)
circos.clear()
circos.heatmap(
mat, split = split, col = col_fun,
dend.side = "outside"
)
circos.clear()
dend.track.height
参数可以控制树状图的高度
rownames.side
参数用于控制矩阵行名的显示,类似于树状图,也是一个独立于热图的轨迹
par(mfcol = c(1, 2))
circos.heatmap(
mat, split = split, col = col_fun,
rownames.side = "inside"
)
circos.clear()
text(0, 0, 'rownames.side = "inside"')
circos.heatmap(
mat, split = split, col = col_fun,
rownames.side = "outside"
)
circos.clear()
text(0, 0, 'rownames.side = "outside"')
同时显示树状图和行名,当然你不会傻到两个图绘制在同一个方向
par(mfcol = c(1, 2))
circos.heatmap(
mat, split = split, col = col_fun,
rownames.side = "inside", dend.side = "outside"
)
circos.clear()
text(0, 0, 'A')
circos.heatmap(
mat, split = split, col = col_fun,
rownames.side = "outside", dend.side = "inside"
)
circos.clear()
text(0, 0, 'B')
行名的图形属性可以是标量值或长度与行数一致向量值
circos.heatmap(
mat, split = split, col = col_fun,
rownames.side = "outside",
rownames.col = 1:nrow(mat) %% 10 + 1,
rownames.cex = runif(nrow(mat), min = 0.3, max = 1.5),
rownames.font = 1:nrow(mat) %% 4 + 1
)
树状图的渲染是通过回调函数来实现的,回调函数会在相应的扇形中生成了树状图之后调用,可以用于设置树状图的顺序和颜色
dend.callback
参数用于接收自定义的回调函数,该函数需要传递三个参数
-
dend
: 当前扇形的树状图 -
m
: 当前扇形所对应的数据子集 -
si
: 当前扇形的索引或名称
默认的回调函数是
function(dend, m, si) reorder(dend, rowMeans(m))
会根据行均值对树状图进行重排
我们也可以使用其他排序方式,例如 dendsort::dendsort()
library(dendsort)
par(mfcol = c(1, 2))
circos.heatmap(
mat, split = split, col = col_fun,
dend.side = "inside"
)
circos.clear()
text(0, 0, "reorder by row means")
circos.heatmap(
mat, split = split, col = col_fun,
dend.side = "inside",
dend.callback = function(dend, m, si) {
dendsort(dend)
}
)
circos.clear()
text(0, 0, "reorder by dendsort")
使用 dendextend
包的 color_branches()
函数,可以对树状图进行渲染
library(dendextend)
dend_col <- structure(rainbow(5), names = letters[1:5])
circos.heatmap(
mat, split = split, col = col_fun,
dend.side = "inside", dend.track.height = 0.2,
dend.callback = function(dend, m, si) {
color_branches(dend, k = 1, col = dend_col[si])
}
)
circos.clear()
如果矩阵未分组,可以设置子树的颜色,在这里,我们设置子树的数量为 4
circos.heatmap(
mat, col = col_fun,
dend.side = "inside", dend.track.height = 0.2,
dend.callback = function(dend, m, si) {
color_branches(dend, k = 4, col = rainbow(4))
}
)
circos.clear()
5. 多个热图轨迹
绘制多个热图轨迹,只要多次调用 circos.heatmap()
就行,但是还有一些细节需要注意
首先,第一次调用 circos.heatmap()
函数时,会初始化圆形布局,后面调用的 circos.heatmap()
函数,将会按照该布局进行绘制,即具有相同的行顺序以及分组。
例如
mat2 <- mat[sample(100, 100), ]
col_fun2 <- colorRamp2(c(-2, 0, 2), c("#af8dc3", "#f7f7f7", "#7fbf7b"))
par(mfcol = c(1, 2))
circos.heatmap(mat, split = split, col = col_fun, dend.side = "outside")
circos.heatmap(mat2, col = col_fun2)
circos.clear()
text(0, 0, "first")
circos.heatmap(mat2, split = split, col = col_fun2, dend.side = "outside")
circos.heatmap(mat, col = col_fun)
circos.clear()
text(0, 0, "second")
上面两个图形都是按照第一条轨迹的布局进行绘制的,如果不想以第一条轨迹作为布局,可以使用 circos.heatmap.initialize()
函数来指定初始化布局的数据,例如
circos.heatmap.initialize(mat, split = split)
circos.heatmap(mat2, col = col_fun2, dend.side = "outside")
circos.heatmap(mat, col = col_fun)
circos.clear()
我们以 mat
数据来初始化布局,然后分别绘制前、后五列的数据
circos.heatmap.initialize(mat, split = split)
circos.heatmap(mat[, 1:5], col = col_fun)
circos.heatmap(mat[, 6:10], col = col_fun)
circos.clear()
可以看到,分开绘制与同事绘制,其整体效果是一样的。
6. 与其他轨迹混用
circos.heatmap()
还可以与其他类型的轨迹混用,在初始化圆形热图布局之后,也有一个特殊的变量 CELL_META
用于保存扇形/轨迹/单元格的信息
-
CELL_META$row_dend(CELL_META$dend)
: 当前扇形的树状图,如果未聚类,则为NULL
-
CELL_META$row_order(simply CELL_META$order)
: 当前扇形所对应的数据子集聚类后的顺序,如果未聚类,则为c(1,2,...)
-
CELL_META$subset
: 当前扇形所对应的数据在矩阵中的索引,索引按升序排列
例如,在热图的内侧绘制行均值的点图
circos.heatmap(mat, split = split, col = col_fun)
row_mean <- rowMeans(mat[, 1:5])
circos.track(
ylim = range(row_mean), panel.fun = function(x, y) {
# 获取当前扇形对应的数据的行均值
y = row_mean[CELL_META$subset]
# 将行均值按聚类顺序排列
y = y[CELL_META$row_order]
# 添加线条和点
circos.lines(CELL_META$cell.xlim, c(0, 0), lty = 2, col = "grey50")
circos.points(seq_along(y) - 0.5, y, col = ifelse(y > 0, "red", "blue"))
},
cell.padding = c(0.02, 0, 0.02, 0)
)
circos.clear()
如果想要在外侧绘制点图,内侧绘制热图,则需要先使用 circos.heatmap.initialize()
函数进行热图布局
circos.heatmap.initialize(mat, split = split)
row_mean <- rowMeans(mat[, 1:5])
circos.track(
ylim = range(row_mean), panel.fun = function(x, y) {
y = row_mean[CELL_META$subset]
y = y[CELL_META$row_order]
circos.lines(CELL_META$cell.xlim, c(0, 0), lty = 2, col = "grey50")
circos.points(seq_along(y) - 0.5, y, col = ifelse(y > 0, "red", "blue"))
},
cell.padding = c(0.02, 0, 0.02, 0)
)
circos.heatmap(mat, split = split, col = col_fun)
circos.clear()
或者,使用箱线图
circos.heatmap(mat, split = split, col = col_fun)
circos.track(
ylim = range(mat), panel.fun = function(x, y) {
m = mat[CELL_META$subset, , drop = FALSE]
m = m[CELL_META$row_order, , drop = FALSE]
n = nrow(m)
# circos.boxplot 应用于矩阵的列,所以需要转置
circos.boxplot(
t(m), pos = 1:n - 0.5, pch = 16, cex = 0.3,
col = CELL_META$sector.numeric.index)
circos.lines(CELL_META$cell.xlim, c(0, 0), lty = 2, col = "grey50")
},
cell.padding = c(0.02, 0, 0.02, 0)
)
circos.clear()
7. 添加注释
显示扇形的标签,可以设置 show.sector.labels = TRUE
。
如果需要显示自定义的标签,需要使用 panel.fun
函数来绘制,例如
circos.heatmap(mat, split = split, col = col_fun)
circos.track(
track.index = get.current.track.index(),
panel.fun = function(x, y) {
circos.text(
CELL_META$xcenter,
CELL_META$cell.ylim[2] + convert_y(2, "mm"),
paste0("this is group ", CELL_META$sector.index),
facing = "bending.inside", cex = 0.8,
adj = c(0.5, 0), niceFacing = TRUE
)
}, bg.border = NA)
circos.clear()
circos.heatmap()
函数并没有提供显示列名的方法,所以只能在 panel.fun
函数中添加
我们需要使用 circos.par()
函数为最后一个扇形设置较大间隔,好放置列标签,然后在 panel.fun
函数中,在最后一个扇形的末尾处添加列标签,例如
circos.par(gap.after = c(2, 2, 2, 2, 10))
circos.heatmap(
mat, split = split, col = col_fun,
track.height = 0.4
)
circos.track(
track.index = get.current.track.index(),
panel.fun = function(x, y) {
if (CELL_META$sector.numeric.index == 5) {
# 在最后一个扇形中
cn = colnames(mat)
n = length(cn)
circos.text(
rep(CELL_META$cell.xlim[2], n) + convert_x(1, "mm"),
1:n - 0.5, cn, cex = 0.5, adj = c(0, 0.5),
facing = "inside"
)
}
},
bg.border = NA
)
circos.clear()
类似的,我们可以将列名换成分组标签,并为标签添加矩形框
circos.par(gap.after = c(2, 2, 2, 2, 10))
circos.heatmap(mat, split = split, col = col_fun, track.height = 0.4)
circos.track(
track.index = get.current.track.index(),
panel.fun = function(x, y) {
if(CELL_META$sector.numeric.index == 5) {
# 第一个分组
circos.rect(
CELL_META$cell.xlim[2] + convert_x(1, "mm"), 0,
CELL_META$cell.xlim[2] + convert_x(5, "mm"), 5,
col = "green", border = NA
)
circos.text(
CELL_META$cell.xlim[2] + convert_x(3, "mm"), 2.5,
"group 1", cex = 0.5, facing = "clockwise"
)
# 第二个分组
circos.rect(
CELL_META$cell.xlim[2] + convert_x(1, "mm"), 5,
CELL_META$cell.xlim[2] + convert_x(5, "mm"), 10,
col = "pink", border = NA
)
circos.text(
CELL_META$cell.xlim[2] + convert_x(3, "mm"), 7.5,
"group 2", cex = 0.5, facing = "clockwise"
)
}
}, bg.border = NA
)
circos.clear()
添加图例
circos.heatmap(mat, split = split, col = col_fun)
circos.clear()
library(ComplexHeatmap)
lgd <- Legend(title = "mat", col_fun = col_fun)
grid.draw(lgd)
8. 复杂热图示例
我们将通过一个示例来说明如何绘制复杂热图,以热图的方式来展示 DNA
甲基化、基因表达以及其他基因组层面的信息。
我们有一份模拟的数据:https://github.com/dxsbiocc/learn/blob/main/data/meth.rds
res_list <- readRDS("Downloads/meth.rds")
type <- res_list$type
mat_meth <- res_list$mat_meth
mat_expr <- res_list$mat_expr
direction <- res_list$direction
cor_pvalue <- res_list$cor_pvalue
gene_type <- res_list$gene_type
anno_gene <- res_list$anno_gene
dist <- res_list$dist
anno_enhancer <- res_list$anno_enhancer
该数据包含 9
部分:
-
mat_meth
: 甲基化谱,行表示差异甲基化区域(DMR
),列表示样本 -
mat_expr
: 与 DMR 相关的基因的表达谱 -
direction
: 甲基化方向(hyper
:超高甲基化,hypo
:低甲基化) -
cor_pvalue
: 基因表达与甲基化的相关性,经过-log10
转化 -
gene_type
: 基因的类型(如,protein coding genes
或lincRNA
) -
anno_gene
: 基因模型的注释(如,intergenic
,intragenic
或transcription start site
(TSS
)) -
dist
:DMR
与相关基因的TSS
之间的距离 -
anno_enhancer
:DMR
与增强子之间的重叠部分
mat_meth
、mat_expr
、cor_pvalue
、dist
和 anno_enhancer
是数值型,可以设置为连续型颜色,其他的数据可以设置为离散的颜色映射
例如
# k-means 聚类
km <- kmeans(mat_meth, centers = 5)$cluster
# 绘制甲基化谱热图
col_meth <- colorRamp2(c(0, 0.5, 1), c("#a6611a", "#f5f5f5", "#018571"))
circos.heatmap(mat_meth, split = km, col = col_meth, track.height = 0.12)
# 绘制甲基化方向热图,离散型
col_direction <- c("hyper" = "red", "hypo" = "blue")
circos.heatmap(direction, col = col_direction, track.height = 0.01)
# 绘制基因表达谱热图
col_expr <- colorRamp2(c(-2, 0, 2), c("#d01c8b", "#f7f7f7", "#4dac26"))
circos.heatmap(mat_expr, col = col_expr, track.height = 0.12)
# 绘制相关性热图
col_pvalue <- colorRamp2(c(0, 2, 4), c("#f1a340", "#f7f7f7", "#998ec3"))
circos.heatmap(cor_pvalue, col = col_pvalue, track.height = 0.01)
# 绘制基因类型热图,离散型
library(RColorBrewer)
col_gene_type <- structure(brewer.pal(length(unique(gene_type)), "Set3"), names = unique(gene_type))
circos.heatmap(gene_type, col = col_gene_type, track.height = 0.01)
# 基因注释信息,离散型
col_anno_gene <- structure(brewer.pal(length(unique(anno_gene)), "Set1"), names = unique(anno_gene))
circos.heatmap(anno_gene, col = col_anno_gene, track.height = 0.01)
# 距离热图
col_dist <- colorRamp2(c(0, 10000), c("#ef8a62", "#67a9cf"))
circos.heatmap(dist, col = col_dist, track.height = 0.01)
# 重叠比例热图
col_enhancer <- colorRamp2(c(0, 1), c("#fc8d59", "#99d594"))
circos.heatmap(anno_enhancer, col = col_enhancer, track.height = 0.03)
因为矩阵的行对应的是差异甲基化区域,我们可以绘制不同区域之间的连接线,表示不同区域之间存在某种关系。
我们构造随机的连接
df_link <- data.frame(
from_index = sample(nrow(mat_meth), 20),
to_index = sample(nrow(mat_meth), 20)
)
绘制连接线
for(i in seq_len(nrow(df_link))) {
# 假设,我们的连接时 DMR1 ——> DMR2.
# 获取 DMR1 所属的扇形
group1 = km[ df_link$from_index[i] ]
# 获取 DMR2 所属的扇形
group2 = km[ df_link$to_index[i] ]
# 获取 DMR1 所属扇形的数据索引
subset1 = get.cell.meta.data("subset", sector.index = group1)
# 该扇形的行顺序
row_order1 = get.cell.meta.data("row_order", sector.index = group1)
# 获取 DMR1 在扇形中的位置
x1 = which(subset1[row_order1] == df_link$from_index[i])
# 获取 DMR2 所属扇形的数据索引
subset2 = get.cell.meta.data("subset", sector.index = group2)
row_order2 = get.cell.meta.data("row_order", sector.index = group2)
x2 = which(subset2[row_order2] == df_link$to_index[i])
# 连接 DMR1 和 DMR2 的中点位置
circos.link(group1, x1 - 0.5, group2, x2 - 0.5, col = rand_color(1))
}
其实,上面的代码也可以直接使用连接行索引的方式
for(i in seq_len(nrow(df_link))) {
circos.heatmap.link(
df_link$from_index[i],
df_link$to_index[i],
col = rand_color(1)
)
}
现在,就剩最后一步了,添加图例,还是使用 ComplexHeatmap::Legend()
函数来进行添加
lgd_meth <- Legend(title = "Methylation", col_fun = col_meth)
lgd_direction <- Legend(
title = "Direction", at = names(col_direction),
legend_gp = gpar(fill = col_direction)
)
lgd_expr <- Legend(title = "Expression", col_fun = col_expr)
lgd_pvalue <- Legend(
title = "P-value", col_fun = col_pvalue, at = c(0, 2, 4),
labels = c(1, 0.01, 0.0001)
)
lgd_gene_type <- Legend(
title = "Gene type", at = names(col_gene_type),
legend_gp = gpar(fill = col_gene_type)
)
lgd_anno_gene <- Legend(
title = "Gene anno", at = names(col_anno_gene),
legend_gp = gpar(fill = col_anno_gene)
)
lgd_dist <- Legend(
title = "Dist to TSS", col_fun = col_dist,
at = c(0, 5000, 10000), labels = c("0kb", "5kb", "10kb")
)
lgd_enhancer <- Legend(
title = "Enhancer overlap", col_fun = col_enhancer,
at = c(0, 0.25, 0.5, 0.75, 1),
labels = c("0%", "25%", "50%", "75%", "100%")
)
同时,我们将热图的绘制放置一个函数里面
circlize_plot = function() {
circos.heatmap(mat_meth, split = km, col = col_meth, track.height = 0.12)
circos.heatmap(direction, col = col_direction, track.height = 0.01)
circos.heatmap(mat_expr, col = col_expr, track.height = 0.12)
circos.heatmap(cor_pvalue, col = col_pvalue, track.height = 0.01)
circos.heatmap(gene_type, col = col_gene_type, track.height = 0.01)
circos.heatmap(anno_gene, col = col_anno_gene, track.height = 0.01)
circos.heatmap(dist, col = col_dist, track.height = 0.01)
circos.heatmap(anno_enhancer, col = col_enhancer, track.height = 0.03)
for(i in seq_len(nrow(df_link))) {
circos.heatmap.link(
df_link$from_index[i], df_link$to_index[i], col = rand_color(1))
}
circos.clear()
}
我们使用 gridBase
包来混合 base
图形和 grid
系统的图例。在这里,我们将图形输出为文件,如果在 RStudio
中绘制,显示的图形会出现不完整或重叠的情况,需要注意
library(gridBase)
# 创建 png 图形设备,并设置足够的大小
# 注意:如果图形设备的大小太小,会提示 "figure margins too large"
# 并且,gridOMI() 会返回负值
png(filename = "~/Downloads/a.png", width = 1000, height = 800)
plot.new()
circle_size = unit(1, "snpc") # snpc unit gives you a square region
pushViewport(viewport(
x = 0, y = 0.5, width = circle_size,
height = circle_size, just = c("left", "center"))
)
# 设置 new = TRUE,避免重新创建图形
par(omi = gridOMI(), new = TRUE)
circlize_plot()
upViewport()
# 获取图形设备的高度
h <- dev.size()[2]
lgd_list = packLegend(
lgd_meth, lgd_direction, lgd_expr,
lgd_pvalue, lgd_gene_type, lgd_anno_gene,
lgd_dist, lgd_enhancer,
max_height = unit(0.9*h, "inch")
)
draw(lgd_list, x = circle_size, just = "left")
dev.off()
circos.clear()
完整代码:https://github.com/dxsbiocc/learn/blob/main/R/plot/circos_heatmap.R