基因家族分析(2) ggplot2绘制motif分析图
2021-05-12 本文已影响0人
R语言数据分析指南
隔了这么长时间终于开始更新基因家族的第2篇文档了,绘制motif分析图。以往做这类分析都是通过TBtools此款软件,今天让我们通过ggplot2来绘制一个更加富有美感的motif图
meme官网(https://meme-suite.org/meme/tools/meme)上传我们的数据进行分析即可;由于我习惯使用本地版meme
,如果您也喜欢本地版,可以运行以下命令进行分析:(此处使用的为氨基酸序列)
meme pep.fa -protein -oc gene-motif -maxsize 60000 -mod zoops -nmotifs 10 -minw 6 -maxw 50 -objfun classic -markov_order 0
![](https://img.haomeiwen.com/i16360488/219ac999f11470e7.png)
可以看到官网提供的图如上所示,不算丑也不富有美感但是不利于我们进行拼图,下面让我们通过ggplot2来绘制一个高端的motif图
加载R包
pacman::p_load(ggtree,treeio,tidyverse,hablar,
gggenes,ggseqlogo,patchwork)
定义颜色
colors <-c("#E41A1C","#1E90FF","#FF8C00","#4DAF4A",
"#984EA3","#40E0D0","#FFC0CB","#00BFFF",
"#FFDEAD","#EE82EE","#00FFFF","#F0A3FF",
"#0075DC", "#993F00","#4C005C","#2BCE48",
"#FFCC99","#808080","#94FFB5","#8F7C00",
"#9DCC00","#426600","#FF0010","#5EF1F2",
"#00998F","#740AFF","#990000","#FFFF00")
绘制保守基序频率图
meme分析时会返会保守基序的序列可下载后进行多序列比对,从而通过ggseqlogo绘制Log图
DNA <- read.table("meme.fa") %>% as.list()
log <- ggplot()+ geom_logo(DNA)+theme_logo()+
labs(x=NULL,y=NULL)+
theme(legend.position='none',
axis.text.x = element_blank(),
axis.text.y = element_blank())
log
![](https://img.haomeiwen.com/i16360488/99c6dc83c83d4a54.png)
ggtree绘制进化树
可通过以下命令构建进化树文件
mafft --maxiterate 1000 --globalpair pip.fa > tree.fasta
fasttree tree.fasta > tree.nwk
ggtree绘制进化树
tree <- read.newick("tree.nwk",node.label = "support") %>%
ggtree(branch.length = "none")+
geom_tiplab(size=3,family="Times",fontface="italic")+
geom_point2(aes(subset=!isTip,fill=support),shape=21,size=2)+
scale_fill_continuous(low='green',high='red')+
xlim(NA,8)+
theme(legend.position = "non")
tree
![](https://img.haomeiwen.com/i16360488/3e794bde71bfb0e8.png)
此处还可以针对基因分组信息对进化树进行进一步美化,再此就不进行展示了
绘制motif分布图
此处的motif.xls可根据meme官网的图整理数据,我一般根据本地版文件进行整理,此处根据进化树的基因顺序调整了Y轴顺序
motif <- read.delim("motif.xls",check.names = F) %>%
as_tibble() %>%
convert(fct(gene,motif)) %>%
ggplot(aes(xmin = start,xmax = end,y=gene,fill = motif)) +
scale_fill_manual(values = colors)+
geom_gene_arrow(arrowhead_height = unit(3, "mm"),
arrowhead_width = unit(0.1,"mm"))+
theme_genes()+
scale_y_discrete(limits=c("TA4","TA5","TA12","TA1",
"TA9","TA3","TA11","TA6","TA13","TA7","TA14",
"TA8","TA2","TA10"))+ylab(NULL)+
theme(legend.title = element_blank(),
axis.text.y=element_blank())
motif
![](https://img.haomeiwen.com/i16360488/36508de184438e9d.png)
patchwork进行图片拼接
log + {tree+motif+plot_layout(ncol = 2,
width = c(1,3.2))}+
plot_layout(ncol = 1,height= c(1.5,3))
![](https://img.haomeiwen.com/i16360488/0e8d430becd2e949.png)
可以看到通过上面一套看似很繁琐,实则更繁琐的步骤我们得到了如上所示的motif的分析图,后面将继续介绍其余图形,敬请期待,感兴趣的小伙伴可以联系我获取本文数据