高分文章复现系列1-瀑布图(有实操代码)
2024-03-05 本文已影响0人
沉迷工作的我
今天看到一篇高分文章(IF=11.4,PMID :33546774)里的一副图很漂亮,忍不住想要复现一下,下面给小伙伴们分享下完整的复现过程。先看下原图和复现后的图对比吧!
文章原图
复现后的图
-
原文数据下载及整理
把文章附件中的突变数据表下载下来,这里我已经下载好了,你们拿来用就行。
image.png
然后需要对数据格式进行整理,这一步我是通过R语言代码来实现的,具体代码如下:
# 程序包安装---
if(!"openxlsx" %in% installed.packages()){install.packages('openxlsx')}
if(!"dplyr" %in% installed.packages()){install.packages('dplyr')}
if(!"tidyverse" %in% installed.packages()){install.packages('tidyverse')}
if(!"aplot" %in% installed.packages()){install.packages('aplot')}
安装好程序包后,加载程序包(加载的时候注意每个程序包是否都安装成功,可以正常加载)。注意程序包版本!!!
# 加载程序包,注意程序包版本!!!---
library(openxlsx)
library(dplyr)
library(tidyverse)
library(aplot) # 版本 0.2.2
library(ggplot2) # 版本 3.2.0
读取数据提取画图用到的基因,并计算基因突变比例。
# 数据读取
df <- read.xlsx("./40164_2021_200_MOESM2_ESM.xlsx",check.names = F)
读取的数据格式如下:
image.png
# 整理画热图部分的数据
gene_list <- c("CHD3","APC","TP53","PALB2",
"FANCA","TET2","DNMT3A","IDH2",
"ARID1A","ARID1B","MLL3","TYK2",
"STAT3","LRRK2","MAP2K1","PCLO",
"PIEZO1","FAT3","CSMD1","NSD1",
"MKI67","WDR90","MGA","CPS1",
"SPEN","ATP10B","ANKRD11","RELN",
"PLCG1","ALK","FLT4","RHOA",
"NOTCH1","NOTCH4")
# 基因突变比例
mut_per <- df %>%
dplyr::filter(Gene.Symbol %in% gene_list) %>%
group_by(PatientID,Gene.Symbol) %>%
summarise(n=n()) %>%
group_by(Gene.Symbol,n) %>%
summarise(nsub=n(),per = nsub/53)
mut_per <- aggregate(mut_per$per, by=list(type = mut_per$Gene.Symbol),sum) %>%
dplyr::rename(per = 'x') %>%
dplyr::rename(Gene.Symbol = 'type')
heat_df <- df %>%
# 挑选画图基因
dplyr::filter(Gene.Symbol %in% gene_list) %>%
dplyr::select(Gene.Symbol,PatientID,Function) %>%
dplyr::rename(Alterations = Function) %>%
merge(mut_per, ., by = 'Gene.Symbol') %>%
dplyr::mutate(mut_pct = round(per * 100,0))
整理后的数据格式heat_df如下:
image.png
对基因进行因子化,固定排列顺序,同时修改突变类型注释信息。
# 因子化,固定基因排列顺序
heat_df$Gene.Symbol <- factor(heat_df$Gene.Symbol,levels = gene_list)
# 突变类型修改
heat_df[heat_df$Alterations == 'cds-del', 'Alterations'] <- 'CDs-Indel'
heat_df[heat_df$Alterations == 'missense', 'Alterations'] <- 'Missense'
heat_df[heat_df$Alterations == 'nonsense', 'Alterations'] <- 'Nonsense'
heat_df[heat_df$Alterations == 'frameshift', 'Alterations'] <- 'FrameShift'
heat_df[heat_df$Alterations == 'splice-5' | heat_df$Alterations == "splice-3", 'Alterations'] <- 'Splicing'
2. 画热图
先画主体的热图,利用ggplot2来画图。根据基因通路分类和样本分组设置分割线,代码如下:
p1 <- ggplot(heat_df, aes(factor(PatientID), fct_rev(Gene.Symbol)))+
geom_tile(aes(fill = Alterations), color = 'white') +
labs(x=NULL,y=NULL) +
theme(axis.text.x = element_blank(),
axis.ticks = element_blank(),
axis.text.y = element_text(size = 20, face = 'italic'),
legend.title = element_text(size = 25, face = 'bold'),
legend.text = element_text(size = 20, face = 'italic'),
panel.border = element_blank()) +
scale_fill_manual(values = c('Missense' = '#103b75',
'CDs-Indel' = '#fc9408',
'Nonsense' = '#f2030d',
'FrameShift' = '#1ffffd',
'Splicing' = '#8e02e7')) +
geom_hline(yintercept=c(3.5, 5.5, 7.5, 19.5, 21.5, 23.5, 29.5, 32.5),
size = 3,
color = 'white') +
geom_vline(xintercept=c(30.5, 36.5, 40.5),
size = 3,
color = 'white')
image.png
3. 画右侧的通路注释信息
这一步需要注意的是通过annotate来指定通路注释的位置!!!
# 画右侧通路注释
ann_df = ggplot() +
annotate(geom = 'text', x = 0, y = 1.7, label = 'T-cell activation', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 4.5, label = 'RTK signaling pathway', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 6.5, label = 'PI3K/AKT pathway', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 13.5, label = 'Other signaling pathway', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 20.5, label = 'MAPK pathway', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 22.5, label = 'JAK-STAT pathway', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 26.5, label = 'Epigenetic Remoldling', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 31, label = 'DNA repair/TP53 pathway', hjust = 0, size = 6, fontface='italic') +
annotate(geom = 'text', x = 0, y = 33.5, label = 'APC(Wnt) pathway', hjust = 0, size = 6, fontface='italic') +
xlim(0,0.02) +
theme_bw() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank())
4. 画上面的样本突变柱状图
# 上部条形图部分
bar_df <- heat_df %>% count(PatientID, Alterations)
p2 <- ggplot(bar_df, aes(factor(PatientID),n))+
geom_bar(stat = "identity", aes(fill = Alterations))+
scale_y_continuous(name = NULL,expand = c(0,0))+
scale_x_discrete(name = NULL)+
theme_classic()+
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title = element_blank(),
axis.line.x = element_blank(),
axis.text.y = element_text(size = 20, face = 'italic'),
axis.line.y = element_line(color = "black",size = 1.1),
axis.ticks.y = element_line(color = "black",size = 1.1),
legend.position = "none") +
scale_y_continuous(expand = c(0.03,0.03)) +
scale_fill_manual(values = c('Missense' = '#103b75',
'CDs-Indel' = '#fc9408',
'Nonsense' = '#f2030d',
'FrameShift' = '#1ffffd',
'Splicing' = '#8e02e7'))
image.png
5.画左侧基因突变频率柱状图
# 突变频率注释
bar_mut <- heat_df[!duplicated(heat_df$Gene.Symbol),]
bar_mut$Gene.Symbol <- factor(bar_mut$Gene.Symbol, levels = gene_list)
p3 <- ggplot(bar_mut, aes(fct_rev(Gene.Symbol),mut_pct))+
geom_bar(stat = "identity")+
coord_flip() +
# scale_x_continuous(name = NULL,expand = c(0,0))+
# scale_x_discrete(name = NULL)+
theme_bw()+
theme(axis.text.x = element_text(size = 20, face = 'italic'),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid = element_blank(),
axis.title = element_blank(),
plot.title = element_text(size = 20, hjust = .5),
panel.border = element_rect(size = 1.5),
legend.position = "none") +
labs(title = 'Mutations Percent', y = '', x = '')
p3
image.png
6. 画底部的注释信息
p4 <- ggplot() +
annotate(geom = 'text', y = 0.01, x = 15, label = 'AITL',
size = 5, fontface ='bold', color = 'black') +
annotate(geom = 'rect', xmin = 0, xmax = 30.2, ymin = 0,
ymax = 0.02, alpha = .3, fill = '#23ff07', color = 'black') +
annotate(geom = 'text', y = 0.01, x = 33, label = 'ALK- \nALCL',
size = 5, fontface ='bold', color = 'black') +
annotate(geom = 'rect', xmin = 30.6, xmax = 36.2, ymin = 0,
ymax = 0.02, alpha = .3, fill = '#22fffe', color = 'black') +
annotate(geom = 'text', y = 0.01, x = 38.5, label = 'ALK+ \nALCL',
size = 5, fontface = 'bold', color = 'black') +
annotate(geom = 'rect', xmin = 36.6, xmax = 40.2, ymin = 0,
ymax = 0.02, alpha = .3, fill = '#23ff07', color = 'black') +
annotate(geom = 'text', y = 0.01, x = 45, label = 'PTCL-NOS',
size = 5, fontface = 'bold', color = 'black') +
annotate(geom = 'rect', xmin = 40.7, xmax = 50.5, ymin = 0,
ymax = 0.02, alpha = .3, fill = '#22fffe', color = 'black') +
theme_bw() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank())
image.png
7. 图片拼图
最后就是把我们上述画的几张图拼接起来,我这里用的是aplot程序包进行拼图。然后保存图片即可。
pic <- p1 %>%
insert_right(ann_df, width = 0.3) %>%
insert_bottom(p4, height = 0.1) %>%
insert_top(p2, height = 0.2) %>%
insert_left(p3, width = 0.3)
ggsave(plot = pic, device = 'png', dpi = 300,
width = 19, height = 12, './heatmap2.png')
完整的全部代码,文章数据我都整理放百度网盘了,私我-->获取百度网盘链接(永久有效)!