在火山图上标记基因
要玩图,离不开哈德雷大神的《R数据科学》,第1章和21章是专门讲图的,我写过对应的笔记:
https://www.jianshu.com/p/4a154f6f0de7
https://www.jianshu.com/p/bf0f12246865
关于火山图加标签的需求,这里有几种方法来实现。
示例数据
方法一的示例数据是data.Rdata,方法二三的示例数据是test.Rdata。我将数据打包放在了“生信星球”公众号后台,回复“火山图”即可获得。你解压后双击文件夹里的volcano.Rproj,复制粘贴运行本文代码即可。
方法一:利用空字符串“”
关于空字符串我曾写过一篇文章来讲他:https://www.jianshu.com/p/aef98f3fc7d8
这种方法的参照是帮助文档里的一段代码:
(先准备好包)
if(!require(ggplot2)) install.packages("ggplot2")
if(!require(ggrepel)) install.packages("ggrepel")
library(ggplot2)
library(ggrepel)
下面代码来源于geom_text_repel的帮助文档
p <- ggplot(mtcars,
aes(wt, mpg, label = rownames(mtcars), colour = factor(cyl))) +
geom_point()
# Hide some of the labels, but repel from all data points
mtcars$label <- rownames(mtcars)
mtcars$label[1:15] <- ""
p + geom_text_repel(data = mtcars, aes(wt, mpg, label = label))
做出的图是这样:
可以看到,一部分点有标签, 一部分没有,思路就是把不要标签的部分变成空字符串“”。
那么参考这个思路为火山图加标签:
(美图预警)
先把图画出来:
load("data.Rdata")
head(data)
# symbol p.value FC change
#1 PCMTD2 1.53544e-11 1.3548360 Stable
#2 KIAA0087 6.71382e-13 0.7314603 Stable
#3 AFAP1L1 4.24611e-12 0.6284560 Stable
#4 CHMP1A 3.76821e-09 1.6035994 Stable
#5 TRERF1 1.80652e-08 0.6875469 Stable
#6 C8B 7.88047e-04 1.2374303 Stable
data$change = ifelse(data$p.value < 0.000001 & abs(log2(data$FC)) >= 1,
ifelse(log2(data$FC)> 1 ,'Up','Down'),
'Stable')
p <- ggplot(data = data,
aes(x = log2(data$FC),
y = -log10(data$p.value),
colour=change,
label = data$symbol)) +
geom_point(alpha=0.4, size=3.5) +
scale_color_manual(values=c("blue", "grey","red"))+
xlim(c(-4.5, 4.5)) +
geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = -log10(0.000001),lty=4,col="black",lwd=0.8) +
labs(x="log2(fold change)",
y="-log10 (p-value)",
title="Differential metabolites") +
theme_bw()+
theme(plot.title = element_text(hjust = 0.5),
legend.position="right",
legend.title = element_blank())
p
然后是加标签,重点就在这里:
data$label=ifelse(data$p.value < 0.000001 & abs(log2(data$FC)) >= 1,data$symbol,"")
p+geom_text_repel(data = data, aes(x = log2(data$FC),
y = -log10(data$p.value),
label = label),
size = 3,box.padding = unit(0.5, "lines"),
point.padding = unit(0.8, "lines"),
segment.color = "black",
show.legend = FALSE)
但是我发现,这个只是适用于数据量比较小的时候,一般来说火山图数以万计的行,用这个方法容易失败。下午尝试了几次大的数据,结果Rstudio无一例外的嘎嘣了。
方法二:看R数据科学
以下代码出自R数据科学笔记第21章:
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_point(size = 3, shape = 1, data = best_in_class) +
ggrepel::geom_label_repel(
aes(label = model),
data = best_in_class
)
这个方法适用于较大的数据。
先把图画出:
load("test.Rdata")
p <- ggplot(data = test,
aes(x = logFC,
y = `-log10(P.value)`)) +
geom_point(alpha=0.4, size=3.5,
aes(color=change)) +
scale_color_manual(values=c("blue", "grey","red"))+
geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = -log10(0.01),lty=4,col="black",lwd=0.8) +
theme_bw()
p
然后加标签,可以自定义数据框的阈值来调整标签的数量:
for_label <- test %>%
filter(abs(logFC) >4& `-log10(P.value)`> -log10(0.000001))
p +
geom_point(size = 3, shape = 1, data = for_label) +
ggrepel::geom_label_repel(
aes(label = symbol),
data = for_label,
color="black"
)
其实原理就是叠加了一个新数据框的两个图层,一个空心点图,一个geom_label_repel。
3.方法三:ggpubr的函数有现成的参数
这个函数叫ggscatter,还是用刚才的test数据来做
由于ggpubr写纵坐标时直接写-log10(P.value)不识别,我采取了迂回策略,改列名,完事再在图上改纵轴标签。
load("test.Rdata")
if(!require(ggpubr))install.packages("ggplubr")
library(ggpubr)
colnames(test)[4] <- "v"
ggscatter(test,
x = "logFC",
y ="v",
ylab="-log10(P.value)",
size=0.5,
color = "change",
palette = c("#00AFBB", "#999999", "#FC4E07")
)
然后加标签,是县城的参数“label.select”。接受的参数数据结构应该是向量。
可以手动选一二三四个感兴趣的基因:
ggscatter(test,
x = "logFC",
y = "v",
ylab="-log10(P.value)",
color = "change",
size = 0.5,
label = "symbol",
repel = T,
palette = c("#00AFBB", "#999999", "#FC4E07") ,
#label.select = dat$symbol[1:30] ,
label.select = c("CD36", "DUSP6", "DCT", "SPRY2", "MOXD1", "ETV4" )
)
也可以用向量取子集的方法来取很多个,比如差异基因前30个:
ggscatter(test,
x = "logFC",
y = "v",
ylab="-log10(P.value)",
color = "change",
size = 0.5,
label = "symbol",
repel = T,
palette = c("#00AFBB", "#999999", "#FC4E07") ,
label.select = test$symbol[1:30]
)