科研信息学绘图

【R】seqlogo图

2020-08-09  本文已影响0人  研究僧小蓝哥

seqlogo图常用于展示特定为区域的序列信息,就像这样^{[1]}​:

之前很好奇这种图是怎么画出来的,后面知道了一个R包:ggseqlogo^{[2]}​。提供了一系列的可视化方法:

作者也提供了完整的教程:https://omarwagih.github.io/ggseqlogo/

这种图,重要的是理解数据结构,然后就可以用在自己的数据上了。本文的示例数据在公众号PLANTOMIX后台回复seqlogo即可获取。

DNA序列

有两种方法,一种是按照Bits进行展示,另外一种是以prob(比例)进行展示。直接将数据放在数据框里面即可:

require(ggplot2)
 require(ggseqlogo)
 library(stringr)
 library(ggsci)
 library(tidyverse)
 ​
 ​
 # DNA序列
 seq_dna = read.table('data/test.DNA.seq.txt', header = T)
 ​
 p1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'prob') +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p1
 ggsave(p1, filename = 'figures/1.png', width = 5, height = 3)
 ​
 p1.1 = ggseqlogo(as.character(seq_dna$test.seq), method = 'bits') +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p1.1
 ggsave(p1.1, filename = 'figures/1.1.png', width = 5, height = 3)

氨基酸序列

 # 氨基酸序列
 seq_aa = read.table('data/test.AA.seq.txt', header = T)
 ​
 p2 = ggseqlogo(as.character(seq_aa$.), method = 'prob') +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p2
 ggsave(p2, filename = 'figures/2.png', width = 5, height = 3)

自定义数据

ggseqlogo支持自定义数据,如数字。

# 自定义序列
 seq_diy = matrix(ncol = 1, nrow = 10) %>%
  as.data.frame()
 ​
 for (i in 1:nrow(seq_diy)) {
  seq.temp = as.character(sample(1:4,10, replace = T))
  seq.temp.2 = seq.temp[1]
  for (j in 2:10) {
  seq.temp.2 = paste(seq.temp.2, seq.temp[j], sep = '')
  }
  seq_diy[i,] = seq.temp.2
 }
 colnames(seq_diy) = 'test.seq'
 ​
 p4 = ggseqlogo(as.character(seq_diy$test.seq), 
  method = 'prob',
  namespace=1:4) +
  theme_bw() +
  scale_y_continuous(labels = scales::percent)
 p4
 ggsave(p4, filename = 'figures/4.png', width = 5, height = 3)

矩阵类型数据

另外一种使用得更多的数据应该是类似这样的:

03.png
 # matrix数据
 seq_matrix = read.table('data/test.matrix.txt', header = T) %>%
  as.matrix()
 ​
 p3 = ggseqlogo(seq_matrix, method = 'bits') +
  theme_bw()
 p3
 ggsave(p3, filename = 'figures/3.png', width = 5, height = 3)

更多可视化方法参照作者教程网站:https://omarwagih.github.io/ggseqlogo/

参考文献

[1] Li, Ying, et al. "Magnaporthe oryzae Auxiliary Activity Protein MoAa91 Functions as Chitin-Binding Protein To Induce Appressorium Formation on Artificial Inductive Surfaces and Suppress Plant Immunity." Mbio 11.2 (2020).
[2] Wagih, Omar. "ggseqlogo: a versatile R package for drawing sequence logos." Bioinformatics 33.22 (2017): 3645-3647.

上一篇 下一篇

猜你喜欢

热点阅读