ggplot2绘图基因组数据绘图

高分文章复现系列1-瀑布图(有实操代码)

2024-03-05  本文已影响0人  沉迷工作的我

今天看到一篇高分文章(IF=11.4,PMID :33546774)里的一副图很漂亮,忍不住想要复现一下,下面给小伙伴们分享下完整的复现过程。先看下原图和复现后的图对比吧!

文章原图

image.png

复现后的图

image.png
  1. 原文数据下载及整理

    把文章附件中的突变数据表下载下来,这里我已经下载好了,你们拿来用就行。


    image.png

    然后需要对数据格式进行整理,这一步我是通过R语言代码来实现的,具体代码如下:

# 程序包安装---
if(!"openxlsx" %in% installed.packages()){install.packages('openxlsx')}
if(!"dplyr" %in% installed.packages()){install.packages('dplyr')}
if(!"tidyverse" %in% installed.packages()){install.packages('tidyverse')}
if(!"aplot" %in% installed.packages()){install.packages('aplot')}

安装好程序包后,加载程序包(加载的时候注意每个程序包是否都安装成功,可以正常加载)。注意程序包版本!!!

# 加载程序包,注意程序包版本!!!---
library(openxlsx)
library(dplyr)
library(tidyverse)
library(aplot) # 版本 0.2.2
library(ggplot2) # 版本 3.2.0

读取数据提取画图用到的基因,并计算基因突变比例。

# 数据读取
df <- read.xlsx("./40164_2021_200_MOESM2_ESM.xlsx",check.names = F)

读取的数据格式如下:


image.png
# 整理画热图部分的数据
gene_list <- c("CHD3","APC","TP53","PALB2",
               "FANCA","TET2","DNMT3A","IDH2",
               "ARID1A","ARID1B","MLL3","TYK2",
               "STAT3","LRRK2","MAP2K1","PCLO",
               "PIEZO1","FAT3","CSMD1","NSD1",
               "MKI67","WDR90","MGA","CPS1",
               "SPEN","ATP10B","ANKRD11","RELN",
               "PLCG1","ALK","FLT4","RHOA",
               "NOTCH1","NOTCH4")

# 基因突变比例
mut_per <- df %>% 
  dplyr::filter(Gene.Symbol %in% gene_list) %>% 
  group_by(PatientID,Gene.Symbol) %>% 
  summarise(n=n()) %>%
  group_by(Gene.Symbol,n) %>% 
  summarise(nsub=n(),per = nsub/53)

mut_per <- aggregate(mut_per$per, by=list(type = mut_per$Gene.Symbol),sum) %>%
  dplyr::rename(per = 'x') %>%
  dplyr::rename(Gene.Symbol = 'type')

heat_df <- df %>%
  # 挑选画图基因
  dplyr::filter(Gene.Symbol %in% gene_list) %>%
  dplyr::select(Gene.Symbol,PatientID,Function) %>%
  dplyr::rename(Alterations = Function) %>%
  merge(mut_per, ., by = 'Gene.Symbol') %>%
  dplyr::mutate(mut_pct = round(per * 100,0))

整理后的数据格式heat_df如下:


image.png

对基因进行因子化,固定排列顺序,同时修改突变类型注释信息。

# 因子化,固定基因排列顺序
heat_df$Gene.Symbol <- factor(heat_df$Gene.Symbol,levels = gene_list)

# 突变类型修改
heat_df[heat_df$Alterations == 'cds-del', 'Alterations'] <- 'CDs-Indel'
heat_df[heat_df$Alterations == 'missense', 'Alterations'] <- 'Missense'
heat_df[heat_df$Alterations == 'nonsense', 'Alterations'] <- 'Nonsense'
heat_df[heat_df$Alterations == 'frameshift', 'Alterations'] <- 'FrameShift'
heat_df[heat_df$Alterations == 'splice-5' | heat_df$Alterations == "splice-3", 'Alterations'] <- 'Splicing'

2. 画热图

先画主体的热图,利用ggplot2来画图。根据基因通路分类和样本分组设置分割线,代码如下:

p1 <- ggplot(heat_df, aes(factor(PatientID), fct_rev(Gene.Symbol)))+
  geom_tile(aes(fill = Alterations), color = 'white') +
  labs(x=NULL,y=NULL) +
  theme(axis.text.x = element_blank(),
        axis.ticks = element_blank(),
        axis.text.y = element_text(size = 20, face = 'italic'),
        legend.title = element_text(size = 25, face = 'bold'),
        legend.text = element_text(size = 20, face = 'italic'),
        panel.border = element_blank()) +
  scale_fill_manual(values = c('Missense' = '#103b75',
                               'CDs-Indel' = '#fc9408',
                               'Nonsense' = '#f2030d',
                               'FrameShift' = '#1ffffd',
                               'Splicing' = '#8e02e7')) +
  geom_hline(yintercept=c(3.5, 5.5, 7.5, 19.5, 21.5, 23.5, 29.5, 32.5),
             size = 3,
             color = 'white') +
  geom_vline(xintercept=c(30.5, 36.5, 40.5),
             size = 3,
             color = 'white')
image.png

3. 画右侧的通路注释信息

这一步需要注意的是通过annotate来指定通路注释的位置!!!

# 画右侧通路注释
ann_df = ggplot() +
  annotate(geom = 'text', x = 0, y = 1.7, label = 'T-cell activation', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 4.5, label = 'RTK signaling pathway', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 6.5, label = 'PI3K/AKT pathway', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 13.5, label = 'Other signaling pathway', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 20.5, label = 'MAPK pathway', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 22.5, label = 'JAK-STAT pathway', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 26.5, label = 'Epigenetic Remoldling', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 31, label = 'DNA repair/TP53 pathway', hjust = 0, size = 6, fontface='italic') +
  annotate(geom = 'text', x = 0, y = 33.5, label = 'APC(Wnt) pathway', hjust = 0, size = 6, fontface='italic') +
  xlim(0,0.02) +
  theme_bw() +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank())

4. 画上面的样本突变柱状图

# 上部条形图部分
bar_df <- heat_df %>% count(PatientID, Alterations)
p2 <- ggplot(bar_df, aes(factor(PatientID),n))+
  geom_bar(stat = "identity", aes(fill = Alterations))+
  scale_y_continuous(name = NULL,expand = c(0,0))+
  scale_x_discrete(name = NULL)+
  theme_classic()+
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title = element_blank(),
        axis.line.x = element_blank(),
        axis.text.y = element_text(size = 20, face = 'italic'),
        axis.line.y = element_line(color = "black",size = 1.1),
        axis.ticks.y = element_line(color = "black",size = 1.1),
        legend.position = "none") +
  scale_y_continuous(expand = c(0.03,0.03)) +
  scale_fill_manual(values = c('Missense' = '#103b75',
                               'CDs-Indel' = '#fc9408',
                               'Nonsense' = '#f2030d',
                               'FrameShift' = '#1ffffd',
                               'Splicing' = '#8e02e7'))
image.png

5.画左侧基因突变频率柱状图

# 突变频率注释
bar_mut <- heat_df[!duplicated(heat_df$Gene.Symbol),]
bar_mut$Gene.Symbol <- factor(bar_mut$Gene.Symbol, levels = gene_list)

p3 <- ggplot(bar_mut, aes(fct_rev(Gene.Symbol),mut_pct))+
  geom_bar(stat = "identity")+
  coord_flip() +
  # scale_x_continuous(name = NULL,expand = c(0,0))+
  # scale_x_discrete(name = NULL)+
  theme_bw()+
  theme(axis.text.x = element_text(size = 20, face = 'italic'),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        panel.grid = element_blank(),
        axis.title = element_blank(),
        plot.title = element_text(size = 20, hjust = .5),
        panel.border = element_rect(size = 1.5),
        legend.position = "none") +
  labs(title = 'Mutations Percent', y = '', x = '')
p3
image.png

6. 画底部的注释信息

p4 <- ggplot() +
  annotate(geom = 'text', y = 0.01, x = 15, label = 'AITL', 
            size = 5, fontface ='bold', color = 'black') +
  annotate(geom = 'rect', xmin = 0, xmax = 30.2, ymin = 0,
           ymax = 0.02, alpha = .3, fill = '#23ff07', color = 'black') +

  annotate(geom = 'text', y = 0.01, x = 33, label = 'ALK- \nALCL', 
           size = 5, fontface ='bold', color = 'black') +
  annotate(geom = 'rect', xmin = 30.6, xmax = 36.2, ymin = 0, 
           ymax = 0.02, alpha = .3, fill = '#22fffe', color = 'black') +

  annotate(geom = 'text', y = 0.01, x = 38.5, label = 'ALK+ \nALCL', 
           size = 5, fontface = 'bold', color = 'black') +
  annotate(geom = 'rect', xmin = 36.6, xmax = 40.2, ymin = 0, 
           ymax = 0.02, alpha = .3, fill = '#23ff07', color = 'black') +

  annotate(geom = 'text', y = 0.01, x = 45, label = 'PTCL-NOS', 
           size = 5, fontface = 'bold', color = 'black') +
  annotate(geom = 'rect', xmin = 40.7, xmax = 50.5, ymin = 0, 
           ymax = 0.02, alpha = .3, fill = '#22fffe', color = 'black') +
  theme_bw() +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.background = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank())
image.png

7. 图片拼图

最后就是把我们上述画的几张图拼接起来,我这里用的是aplot程序包进行拼图。然后保存图片即可。

pic <- p1 %>% 
  insert_right(ann_df, width = 0.3) %>%
  insert_bottom(p4, height = 0.1) %>%
  insert_top(p2, height = 0.2) %>%
  insert_left(p3, width = 0.3)

  ggsave(plot = pic, device = 'png', dpi = 300,
       width = 19, height = 12, './heatmap2.png')

完整的全部代码,文章数据我都整理放百度网盘了,私我-->获取百度网盘链接(永久有效)!

上一篇下一篇

猜你喜欢

热点阅读