pheatmap 画NA值热图与 image 画三角统计分析图
2019-07-11 本文已影响4人
Soliva
前言:生物信息工作中各种热图层出不穷,介绍两种特殊的热图画法
1.含有NA数据的热图
image.png
数据格式如上图所示
python 中用pandas 处理一下数据
import pandas as pd
import numpy as np
pd.read_excel('./GI50.xlsx',encode='tuf-8',index_col=0).replace('-',np.nan).to_csv('GI50.csv') ###第一种用NA替换,但是这样热图就失去了聚类
a.replace('-',0).astype('float').replace(np.inf,-8).to_csv('result.csv') ### 第二种用一个偏离很大的值替换,这样结果就会聚类在一起
热图用R pheatmap
library(pheatmap)
a=read.csv('result.csv',row.names = 1,header = T)
pheatmap(a, cluster_col = FALSE,legend_breaks = c( -8,seq(from=-0.2, to=max(a), by=2)),legend_labels = c("NA",seq(from=0, to=8, by=2)),legend = TRUE,fontsize = 8,filename = 'pheatmaps.pdf')#,breaks = c(-15,seq(from=-8, to=0, by=2)), color = c(colorRampPalette(colors = c("blue","white"))(100)))
image.png
image.png
在使用maftools中有个一个互斥图非常好看,而且在图中表示了一部分F检验的内容
下面就用image画一个类似的图形分别用T检验和Pearson检验来画
原始数据是这样的
image.png
用pandas 处理数据
数据两两计算形成一个对称的矩阵
在将重复部分用np.nan填充
image.png
使用R image 来画一个
#注释部分是上面的图的内容
### 主要图形
m <- nrow(tdata)
n <- ncol(tdata)
par(bty="n", mgp = c(2,.5,0), mar = c(2, 4, 3, 5)+.1, las=2, tcl=-.33)
image(1:n, 1:m, as.matrix(tdata),col=RColorBrewer::brewer.pal(9,'Blues'),
xaxt="n", yaxt="n",
xlab="",ylab="", xlim=c(0, n+4), ylim=c(0, n+4))
abline(h=0:n+.5, col="white", lwd=.5)
abline(v=0:n+.5, col="white", lwd=.5)
mtext(side = 2, at = 1:m, text = colnames(tdata), cex = 0.8, font = 3)
mtext(side = 3, at = 1:n, text = colnames(tdata), las = 2, line = -2, cex = 0.8, font = 3)
#### Pvalue中的点
# w = arrayInd(which(pdata< 0.01), rep(m,2))
# points(w, pch="*", col="black")
# w = arrayInd(which(pdata< 0.05), rep(m,2))
# points(w, pch=".", col="black")
#image(y = 1:8 +6, x=rep(n,2)+c(2,2.5)+1, z=matrix(c(1:8), nrow=1), col=brewer.pal(8,"PiYG"), add=TRUE)
###图例部分
image(y = seq(0.5*nrow(tdata), 0.9*nrow(tdata), length.out = 8), x=rep(n,2)+c(2,2.5)+1, z=matrix(c(1:8), nrow=1), col = RColorBrewer::brewer.pal(8,'Blues'), add=TRUE)
#axis(side = 4, at = seq(1,7) + 6.5, tcl=-.15, label=seq(-3, 3), las=1, lwd=.5)
atLims = seq(0.5*nrow(tdata), 0.9*nrow(tdata), length.out = 5)
axis(side = 4, at = atLims, tcl=-.15, labels =seq(from=0, to=300, length.out=5), las=1, lwd=.5)
mtext(side=4, at = median(atLims), "Student's t test -log10", las=3, cex = 0.9, line = 3, font = 2)
par(xpd=NA)
### 文字部分
# text(x=n+2.2, y= max(atLims)+1.2, "Co-occurance", pos=4, cex = 0.9, font = 2)
# text(x=n+2.2, y = min(atLims)-1.2, "Exclusive", pos=4, cex = 0.9, font = 2)
# points(x = n+1, y = 0.2*n, pch = "*", cex = 2)
# text(x = n+1, y = 0.2*n, paste0(" p < ",0.01), pos=4, cex = 0.9, font = 2)
# points(x = n+1, y = 0.1*n, pch = ".", cex = 2)
# text(x = n+1, y = 0.1*n, paste0("p < ", 0.05), pos=4, cex = 0.9)
image.png
image.png