基因组数据绘图转录组实战R

关于heatmap

2020-06-02  本文已影响0人  果蝇饲养员的生信笔记

一、热图的介绍

(1)什么是热图

热图是heatmap的直译,用暖色表示数值大,冷色表示数值小。行为基因,列为样本。

(2)热图的作用

二、热图中数据的标准化

(1)为什么要标准化

热图的图例是以0为中心,变化范围在±3左右。RNA-seq得到的基因表达量其实差异是非常大的,从个位数到上千,所以首先要对基因的表达量进行归一化,不然我们得需要多大尺度的色阶才能表示这么大范围的变化啊。标准化可以使数据范围缩小,便于用合适的颜色色度进行表示,能避免由于单个数据过大(过小),导致冷暖色分布不明显的现象。

(2)怎样标准化

可以按基因、按样本或全局标准化。一般选择按行(基因)进行标准化。如果对同一行进行标准化,就不能比较同一样品中不同基因的表达(不能比较不在同一行的z-score数值,原来不同基因绝对表达量高的也有可能在同一样品中得到低的z-score分数)。如果没有异常值,也可以进行全局的标准化。

(3)标准化方法

三、热图中数据的聚类

(1)什么是聚类

聚类,就是把相似的东西分到一组。一个聚类算法通常只需要知道如何计算相似度就可以开始工作了,是无监督学习。聚类本质上是集合划分问题。因为没有人工定义的类别标准,因此算法要解决的核心问题是如何定义簇,唯一的要求是簇内的样本尽可能相似。通常的做法是根据簇内样本之间的距离,或是样本点在数据空间中的密度来确定。对簇的不同定义可以得到各种不同的聚类算法。常见的聚类算法有:

(2)层次聚类

找出哪两个基因最相似(similar),将它们合并作为一个整体(merge them to a cluster);再找出哪两个基因最相似,合并成一个整体;重复这个步骤,直到所有的基因都聚成一类。
层次聚类通常会伴随着产生一个系统树图(dendrogram),表示了相似性和聚类的顺序。

(3)相似性如何定义

可以有多种计算距离的方式,如Euclidean distance(欧氏距离)、Manhattan distance等,现实中绝大多数时候会使用欧氏距离。使用包含欧氏距离的聚类算法时,要注意数据的量纲影响,有必要时先进行数据标准化处理。
You need to make sure the scales for each variable are roughly equivalent, otherwise you will be biased towards one of them. The standard practice is to divide each variable by its standard deviation.

(4)聚类如何作为一个整体

四、pheatmap(Pretty Heatmaps)

(1)R Script

> library(pheatmap) 
> library(RColorBrewer) 

> #测试数据
> testdata1[1:5,1:5]
              CF1      CF2      CF3      CF4       CM1
rna13468 66.97984 80.07318 54.87525 91.65463  1.401584
rna885   26.53467 44.51450 33.65076 40.72113 60.633389
rna32332  0.00000 37.71825 11.89813  0.00000  1.403419
rna8744  42.44415 52.31791 60.35968 54.00533 39.524090
rna16488  0.00000  0.00000  0.00000  0.00000  0.000000
> #样本信息
> anno_col <- data.frame(Group=c(rep("CF",4),rep("CM",4),
+                                rep("PF",4),rep("PM",4)), 
+                        row.names = colnames(testdata1)) 
> anno_col$Group <- factor(anno_col$Group,levels=unique(anno_col$Group))

> #基因注释
> head(anno_row)
         gene_name chromosome
rna10784   CG13244         2L
rna12885   CG14589         2R
rna13468    CG1882         2R
rna16488    CG8311         2R
rna18953    CG5539         2R
rna20313    CG1317         3L
> anno_row$chromosome <- droplevels(anno_row$chromosome)
> #注释颜色
> Group_col <- brewer.pal(8,"Set2")[2:5]
> names(Group_col) <- levels(anno_col$Group)
> Chr_col <- brewer.pal(8,"Set3")[1:5]
> names(Chr_col) <- levels(anno_row$chromosome)
> ann_colors <- list(Group=Group_col, 
+                    Chromosome=Chr_col)
> #默认参数绘图
> pheatmap(testdata1)
pheatmap-default.JPG
> #添加注释等
pheatmap( testdata1, 
+           scale="row", 
+           clustering_method="average", 
+           color=colorRampPalette(c("green", "black","red"))(100), 
+           annotation_col=anno_col, 
+           annotation_row=anno_row[2], 
+           annotation_names_row=F, annotation_names_col=T, 
+           annotation_colors=ann_colors, 
+           border_color = NA)
pheatmap-annotation.JPG

(2)用法

pheatmap(mat, color = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100), kmeans_k = NA, breaks = NA, border_color = "grey60", cellwidth = NA, cellheight = NA, scale = "none", cluster_rows = TRUE, cluster_cols = TRUE, clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean", clustering_method = "complete", clustering_callback = identity2, cutree_rows = NA, cutree_cols = NA, treeheight_row = ifelse((class(cluster_rows) == "hclust") || cluster_rows, 50, 0), treeheight_col = ifelse((class(cluster_cols) == "hclust") || cluster_cols, 50, 0), legend = TRUE, legend_breaks = NA, legend_labels = NA, annotation_row = NA, annotation_col = NA, annotation = NA, annotation_colors = NA, annotation_legend = TRUE, annotation_names_row = TRUE, annotation_names_col = TRUE, drop_levels = TRUE, show_rownames = T, show_colnames = T, main = NA, fontsize = 10, fontsize_row = fontsize, fontsize_col = fontsize, angle_col = c("270", "0", "45", "90", "315"), display_numbers = F, number_format = "%.2f", number_color = "grey30", fontsize_number = 0.8 * fontsize, gaps_row = NULL, gaps_col = NULL, labels_row = NULL, labels_col = NULL, filename = NA, width = NA, height = NA, silent = FALSE, na_col = "#DDDDDD", ...)

(3)参数

五、ComplexHeatmap

(1)R Script

> library(ComplexHeatmap) 
> library(circlize) 
> library(RColorBrewer) 

> #提前标准化数据
> heatmapdata <- testdata1 
> heatmapdata <- t(scale(t(heatmapdata))) 
> #heatmapdata[heatmapdata>2]=2 
> #heatmapdata[heatmapdata<-2]=-2 

默认参数绘图

> Heatmap(testdata1)
> Heatmap(heatmapdata)
ComplexHeatmap-default.JPG
热图颜色
> col_fun = colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
> Heatmap(heatmapdata, name=" ", col=col_fun)
> Heatmap(heatmapdata, name=" ", col=rev(rainbow(10)))
ComplexHeatmap-col.JPG
边界和网格
> Heatmap(heatmapdata, name=" ", col=col_fun,
+         border=TRUE,rect_gp=gpar(col="white",lwd=2))
ComplexHeatmap-border.JPG
渲染聚类树
> #install.packages("dendextend")
> library(dendextend)
> col_dend = hclust(dist(t(heatmapdata)))
> row_dend = hclust(dist(heatmapdata))
> Heatmap(heatmapdata, name=" ", 
+         cluster_columns=color_branches(col_dend,k=4),
+         cluster_rows = color_branches(row_dend, k=4))
ComplexHeatmap-dendextend.JPG
组内聚类
> Heatmap(heatmapdata, name=" ", 
+         cluster_columns=cluster_within_group(heatmapdata,anno_col$Group))
ComplexHeatmap-cluster_within_group.JPG
分割热图
> Heatmap(heatmapdata, name=" ", 
+         split = anno_row$chromosome,
+         column_split = anno_col$Group)
> Heatmap(heatmapdata, name=" ", 
+         split = anno_row$chromosome,
+         column_split = anno_col$Group,
+         column_title = NULL,
+         row_title_gp = gpar(col="white",fontsize=11,
+                             fill=brewer.pal(8,"Set1")[1:5]))
ComplexHeatmap-split.JPG
图例
> Heatmap(heatmapdata, name=" ",
+         heatmap_legend_param = list(legend_height=unit(6,"cm"),
+                                     grid_width=unit(0.8,"cm")))
ComplexHeatmap-legend.JPG
简单注释
> ha_col <- HeatmapAnnotation(Group=anno_col$Group,
+                         col=list(Group=Group_col))
> ha_row <- HeatmapAnnotation(Chr=anno_row$chromosome,
+                             col=list(Chr=Chr_col),
+                             which="row")
> Heatmap(heatmapdata, name=" ",
+         top_annotation = ha_col,
+         left_annotation = ha_row)
ComplexHeatmap-annotation.JPG
复杂注释
> ha_box <- HeatmapAnnotation(boxplot=anno_boxplot(heatmapdata,
+                                                  height=unit(2,"cm"),
+                                                  box_width = 0.9,
+                                                  gp=gpar(fill=Chr_col[anno_row$chromosome]),
+                                                  outline = F),
+                             which="row")
> Heatmap(heatmapdata, name=" ",
+         top_annotation = ha_col,
+         left_annotation = ha_row,
+         right_annotation = ha_box)
ComplexHeatmap-annotation-box.JPG
标签注释
> ha_lab <- rowAnnotation(gene=anno_mark(at=which(row.names(heatmapdata)=="rna34188"),
+                                        labels = "CycG"))
> Heatmap(heatmapdata, name=" ",
+         show_row_names = F,
+         right_annotation = ha_lab)
ComplexHeatmap-annotation-lab.JPG

(2)reference

可以去看ComplexHeatmap Complete Reference,非常详细。

single-heatmap.JPG
heatmap-list.JPG
上一篇下一篇

猜你喜欢

热点阅读