pheatmap 画NA值热图与 image 画三角统计分析图

2019-07-11 本文已影响4人 Soliva

前言：生物信息工作中各种热图层出不穷，介绍两种特殊的热图画法
1.含有NA数据的热图

image.png

数据格式如上图所示
python 中用pandas 处理一下数据

import pandas as pd
import numpy as np
pd.read_excel('./GI50.xlsx',encode='tuf-8',index_col=0).replace('-',np.nan).to_csv('GI50.csv')   ###第一种用NA替换，但是这样热图就失去了聚类
a.replace('-',0).astype('float').replace(np.inf,-8).to_csv('result.csv') ### 第二种用一个偏离很大的值替换，这样结果就会聚类在一起

热图用R pheatmap

library(pheatmap)
a=read.csv('result.csv',row.names = 1,header = T)
pheatmap(a, cluster_col = FALSE,legend_breaks = c( -8,seq(from=-0.2, to=max(a), by=2)),legend_labels = c("NA",seq(from=0, to=8, by=2)),legend = TRUE,fontsize = 8,filename = 'pheatmaps.pdf')#,breaks = c(-15,seq(from=-8, to=0, by=2)), color = c(colorRampPalette(colors = c("blue","white"))(100)))

image.png

在使用maftools中有个一个互斥图非常好看，而且在图中表示了一部分F检验的内容
下面就用image画一个类似的图形分别用T检验和Pearson检验来画
原始数据是这样的

image.png

用pandas 处理数据
数据两两计算形成一个对称的矩阵
在将重复部分用np.nan填充

image.png

使用R image 来画一个

#注释部分是上面的图的内容
###  主要图形
    m <- nrow(tdata)
    n <- ncol(tdata) 
    par(bty="n", mgp = c(2,.5,0), mar = c(2, 4, 3, 5)+.1, las=2, tcl=-.33)
    image(1:n, 1:m,  as.matrix(tdata),col=RColorBrewer::brewer.pal(9,'Blues'),
          xaxt="n", yaxt="n",
          xlab="",ylab="", xlim=c(0, n+4), ylim=c(0, n+4))
    abline(h=0:n+.5, col="white", lwd=.5)
    abline(v=0:n+.5, col="white", lwd=.5)
    mtext(side = 2, at = 1:m, text = colnames(tdata), cex = 0.8, font = 3)
    mtext(side = 3, at = 1:n, text = colnames(tdata), las = 2, line = -2, cex = 0.8, font = 3)

####  Pvalue中的点
#     w = arrayInd(which(pdata< 0.01), rep(m,2))
#     points(w, pch="*", col="black")
#     w = arrayInd(which(pdata< 0.05), rep(m,2))
#     points(w, pch=".", col="black")
    #image(y = 1:8 +6, x=rep(n,2)+c(2,2.5)+1, z=matrix(c(1:8), nrow=1), col=brewer.pal(8,"PiYG"), add=TRUE)
###图例部分
    image(y = seq(0.5*nrow(tdata), 0.9*nrow(tdata), length.out = 8), x=rep(n,2)+c(2,2.5)+1, z=matrix(c(1:8), nrow=1), col = RColorBrewer::brewer.pal(8,'Blues'), add=TRUE)
    #axis(side = 4, at = seq(1,7) + 6.5,  tcl=-.15, label=seq(-3, 3), las=1, lwd=.5)
    atLims = seq(0.5*nrow(tdata), 0.9*nrow(tdata), length.out = 5)
    axis(side = 4, at = atLims,  tcl=-.15, labels =seq(from=0, to=300, length.out=5), las=1, lwd=.5)
    mtext(side=4, at = median(atLims), "Student's t test -log10", las=3, cex = 0.9, line = 3, font = 2)
    par(xpd=NA)
### 文字部分
#     text(x=n+2.2, y= max(atLims)+1.2, "Co-occurance", pos=4, cex = 0.9, font = 2)
#     text(x=n+2.2, y = min(atLims)-1.2, "Exclusive", pos=4, cex = 0.9, font = 2)

#     points(x = n+1, y = 0.2*n, pch = "*", cex = 2)
#     text(x = n+1, y = 0.2*n, paste0(" p < ",0.01), pos=4, cex = 0.9, font = 2)
#     points(x = n+1, y = 0.1*n, pch = ".", cex = 2)
#     text(x = n+1, y = 0.1*n, paste0("p < ", 0.05), pos=4, cex = 0.9)

image.png

pheatmap 画NA值热图与 image 画三角统计分析图

猜你喜欢

热点阅读