Gviz 的使用
参考链接:
http://www.sthda.com/english/wiki/print.php?id=43
http://www.bioconductor.org/packages/release/bioc/manuals/Gviz/man/Gviz.pdf
注意 :输入的文件一定要保证和ucsc的编码一致!注意输入文件的染色体名称chr,Chr
Gviz - 可视化基因组数据
Gviz软件包旨在提供一个结构化的可视化框架,用于沿基因组坐标绘制任何类型的数据。它还允许整合来自UCSC或ENSEMBL等来源的公开可用的基因组注释数据。
默认情况下,Gviz会根据UCSC定义检查所有提供的染色体名称的有效性(染色体必须以chr字符串开头),可以通过调用选项(ucscChromosomeNames = FALSE)来决定关闭此功能。
示例是用小鼠mm9基因组上的UCSC基因组和染色体7(chr7)。
[1]绘制注释轨迹
请注意,AnnotationTrack构造函数可以容纳许多不同类型的输入。例如,注释要素的开始和结束坐标可以作为单独的参数start和end,作为data.frame或甚至作为IRanges或GRangesList对象传递。
library(Gviz)
library(GenomicRanges)
#Load data:class = GRanges
data(cpgIslands)
cpgIslands
#Annotation track, title ="CpG"
atrack <- AnnotationTrack(cpgIslands, name = "CpG")
plotTracks(atrack)
[2]添加基因组轴轨道
此步骤用于指示当前正在查看的基因组坐标。
# genomic coordinates
gtrack <- GenomeAxisTrack()
plotTracks(list(gtrack, atrack))
[3]添加染色体
为了添加染色体,必须指出有效的UCSC基因组(例如:“hg19”)染色体名称(例如:“chr7”)。(由于该功能从UCSC获取数据,因此需要Internet连接,并且可能需要很长时间)
#genome : "hg19"
gen<-genome(cpgIslands)
#Chromosme name : "chr7"
chr <- as.character(unique(seqnames(cpgIslands)))
#Ideogram track
itrack <- IdeogramTrack(genome = gen, chromosome = chr)
plotTracks(list(itrack, gtrack, atrack))
[4]添加基因模型
我们可以利用现有本地来源的基因模型信息,或者从一个可用的在线资源(如UCSC或ENSEBML)下载此类数据,并且构造函数来处理这些任务。
在本例中,我们将从存储的data.frame中加载基因模型数据。
#Load data
data(geneModels)
head(geneModels)
#Plot
grtrack <- GeneRegionTrack(geneModels, genome = gen,chromosome = chr, name = "Gene Model")
plotTracks(list(itrack, gtrack, atrack, grtrack))
[5]缩放图
我们经常想要放大或缩小特定的绘图区域以查看更多细节或获得更广泛的概述。
#Use from and to arguments to zoom
plotTracks(list(itrack, gtrack, atrack, grtrack),
from = 26700000, to = 26750000)
# Use extend.left and extend.right to zoom
#those arguments are relative to the currently displayed ranges,
#and can be used to quickly extend the view on one or both ends of the plot.
plotTracks(list(itrack, gtrack, atrack, grtrack),
extend.left = 0.5, extend.right = 1000000)
# to drop the bounding borders of the exons and
# to have a nice plot
plotTracks(list(itrack, gtrack, atrack, grtrack),
extend.left = 0.5, extend.right = 1000000, col = NULL)
#A value of 0.5 will cause zooming in to half the currently displayed range.
添加序列轨迹和缩放以查看序列
必要的序列信息来自一个BSgenome包。
library(BSgenome.Hsapiens.UCSC.hg19)
strack <- SequenceTrack(Hsapiens, chromosome = chr)
plotTracks(list(itrack, gtrack, atrack, grtrack,
strack), from = 26591822, to = 26591852, cex = 0.8)
[6]添加数据轨道
DataTrack对象本质上是run-length encoded (Rle) 的数值向量或矩阵,我们可以使用它们将各种数值数据添加到我们的基因组坐标图中。可以使用这些轨道的不同可视化选项,包括点图,直方图和箱线图。
#For demonstration purposes we can create a simple DataTrack object from
#randomly sampled data.
set.seed(255)
lim <- c(26700000, 26750000)
coords <- sort(c(lim[1], sample(seq(from = lim[1],
to = lim[2]), 99), lim[2]))
dat <- runif(100, min = -10, max = 10)
head(dat)
## [1] 3.5382 -9.3984 -2.8372 -1.9871 0.2722 8.2508
##data track
dtrack <- DataTrack(data = dat, start = coords[-length(coords)],
end = coords[-1], chromosome = chr, genome = gen,
name = "Uniform")
##Plot data track
plotTracks(list(itrack, gtrack, atrack, grtrack,
dtrack), from = lim[1], to = lim[2])
#Change plot type to histogram
plotTracks(list(itrack, gtrack, atrack, grtrack,dtrack),
from = lim[1], to = lim[2], type = "histogram")
[7]设置参数
#Annotation of transcript
#Change panel and title background color
grtrack <- GeneRegionTrack(geneModels, genome = gen,
chromosome = chr, name = "Gene Model",
transcriptAnnotation = "symbol",
background.panel = "#FFFEDB",
background.title = "darkblue")
plotTracks(list(itrack, gtrack, atrack, grtrack))
[8]方向
默认情况下,所有轨道都将以5' - > 3'方向绘制。有时实际显示相对于相反链的数据可能是有用的。
plotTracks(list(itrack, gtrack, atrack, grtrack),
reverseStrand = TRUE)
[9]显示GenomeAxisTrack对象的参数
将标签的位置设置为下方,显示ID,更改颜色
axisTrack <- GenomeAxisTrack(range = IRanges(start = c(2000000,4000000),
end = c(3000000, 7000000),
names = rep("N-stretch", 2))
)
plotTracks(axisTrack, from = 1000000, to = 9000000,
labelPos = "below",showId=TRUE, col="red")
[10]IdeogramTrack
# Ideogram
ideoTrack <- IdeogramTrack(genome = "hg19", chromosome = "chrX")
plotTracks(ideoTrack, from = 85000000, to = 129000000)
#Show chromosome band ID
plotTracks(ideoTrack, from = 85000000, to = 129000000,
showId = FALSE, showBandId = TRUE, cex.bands = 0.4)
[11]DataTrack
基本上它们构成了与特定基因组坐标范围相关的游程编码数值向量或矩阵。单个数据集中可以有多个样本,绘图方法提供了合并样本组信息的工具。
因此,创建DataTrack对象的起点始终是一组范围,可以是IRanges或GRanges对象的形式,也可以单独作为开始和结束坐标或宽度。第二个成分是与范围数相同长度的数字向量,或具有相同列数的数字矩阵。
我们将Gviz包的一部分的GRanges,作为我们的样本数据进行加载。
#Load data
data(twoGroups)
head(twoGroups)
#Plot data track
dTrack <- DataTrack(twoGroups, name = "uniform")
plotTracks(dTrack)
#DataTrack的默认可视化是点图。
[12]不同的plot类型:
#dotplot
plotTracks(DataTrack(twoGroups, name = "p"), type="p")
#lines plot
plotTracks(DataTrack(twoGroups, name = "l"), type="l")
#line and dot plot
plotTracks(DataTrack(twoGroups, name = "b"), type="b")
#lines plot of average
plotTracks(DataTrack(twoGroups, name = "a"), type="a")
#histogram lines
plotTracks(DataTrack(twoGroups, name = "h"), type="h")
#histogram histogram (bar width equal to range with)
plotTracks(DataTrack(twoGroups, name = "histogram"), type="histogram")
#'polygon-type' plot relative to a baseline
plotTracks(DataTrack(twoGroups, name = "polygon"), type="polygon")
#box and whisker plot
plotTracks(DataTrack(twoGroups, name = "boxplot"), type="boxplot")
#false color image of the individual values
plotTracks(DataTrack(twoGroups, name = "heatmap"), type="heatmap")
[13]DataTrack图的示例:
#Combine a boxplot with an average line and a data grid (g):
plotTracks(dTrack, type = c("boxplot", "a", "g"))
#Heatmap and show sample names
plotTracks(dTrack, type = c("heatmap"), showSampleNames = TRUE,
cex.sampleNames = 0.6)
[14]数据分组
可以使用因子变量将各个样本分组在一起。
plotTracks(dTrack, groups = rep(c("control", "treated"), each = 3),
type = c("a", "p"), legend=TRUE)
#Boxplot
plotTracks(dTrack, groups = rep(c("control", "treated"), each = 3),
type = "boxplot")
#Aggregate group. aggregation can be mean, median, extreme,
#sum, min and max
plotTracks(dTrack, groups = rep(c("control", "treated"),each = 3),
type = c("b"), aggregateGroups = TRUE,
aggregation = "max")
[15]从文件构建DataTrack对象
DataTrack类支持最常见的文件类型,如wig,bigWig,bedGraph和bam文件。
bgFile <- system.file("extdata/test.bedGraph", package = "Gviz")
dTrack2 <- DataTrack(range = bgFile, genome = "hg19",
type = "l", chromosome = "chr19", name = "bedGraph")
plotTracks(dTrack2)
Gviz包中文件支持的真正威力来自索引文件的流式传输。在绘图操作期间只需要加载数据的相关部分,因此底层数据文件可能非常大而不会降低性能或导致太大的内存占用。
通过随包提供的小bam文件来举例说明此功能。bam文件包含序列比对到参考基因组的信息。DataTrack中此类数据的最自然表示是仅查看给定位置的对齐覆盖率,并将其编码在单个elementMetadata列中。
bamFile <- system.file("extdata/test.bam", package = "Gviz")
dTrack4 <- DataTrack(range = bamFile, genome = "hg19",
type = "l", name = "Coverage", window = -1, chromosome = "chr1")
plotTracks(dTrack4, from = 189990000, to = 190000000)
[16]AnnotationTrack
基本上它们由一个或几个基因组范围组成,如果需要,可以将它们分组为复合注释元素。必要的构建块是范围坐标,染色体和基因组标识符。信息可以以单独的函数参数的形式传递给函数,如IRanges,GRanges或data.frame对象。
aTrack <- AnnotationTrack(start = c(10, 40, 120),
width = 15, chromosome = "chrX", strand = c("+","*", "-"),
id = c("Huey", "Dewey", "Louie"),
genome = "hg19", name = "foo")
plotTracks(aTrack)
[17]从文件构建AnnotationTrack对象
默认导入函数从bam文件中读取所有序列比对的坐标。
#Annotation track
aTrack2 <- AnnotationTrack(range = bamFile, genome = "hg19",
name = "Reads", chromosome = "chr1")
plotTracks(aTrack2, from = 189995000, to = 190000000)
我们现在可以将DataTrack表示以及bam文件的AnnotationTrack表示一起绘制,以证明底层数据确实相同.
plotTracks(list(dTrack4, aTrack2), from = 189990000,
to = 190000000)
[18]GeneRegionTrack
GeneRegionTrack对象原则上与AnnotationTrack对象非常相似。唯一的区别是它们以基因/转录本为中心。我们需要在轨道中传递每个注释要素的起始位置和结束位置(或宽度),并为每个将用于创建成绩单分组的项目提供外显子,转录本和基因标识符。
一个特殊情况是直接从一个流行的TranscriptDb对象构建一个GeneRegionTrack对象,这个选项将在下面详细介绍。
data(geneModels)
grtrack <- GeneRegionTrack(geneModels, genome = gen,
chromosome = chr, name = "foo",
transcriptAnnotation = "symbol")
[^19]BiomartGeneRegion Track
BiomartGeneRegion Track类,提供ENSEMBL Biomart服务的直接接口。我们只需输入基因组,染色体以及该染色体上的起始和终点位置,构建函数BiomartGeneRegionTrack将自动联系ENSEMBL,获取必要的信息并动态构建基因模型。
请注意,您需要使用互联网才能使用此功能,并且根据使用情况和网络流量,与Biomart联系可能会花费大量时间。
biomTrack <- BiomartGeneRegionTrack(genome = "hg19",
chromosome = chr, start = 20000000, end = 21000000,
name = "ENSEMBL")
plotTracks(biomTrack, col.line = NULL, col = NULL)
[^20]Sequence Track
library(BSgenome.Hsapiens.UCSC.hg19)
sTrack <- SequenceTrack(Hsapiens)
#sequence track : add 5'->3'
plotTracks(sTrack, chromosome = 1, from = 20000,to = 20050,
add53=TRUE)
#The complement
plotTracks(sTrack, chromosome = 1, from = 20000,to = 20050,
add53=TRUE, complement = TRUE)
[^21]AlignmentsTrack
通常用来展示NGS实验的比对序列图,例如当视觉检查被称SNP的有效性时。这些对齐通常存储在BAM文件中。
RNAseq实验
对于这个演示,让我们使用一个小的BAM文件,配对的NGS读数已被映射到人类hg19基因组的提取物。数据来自RNASeq实验,并且使用STAR对准器进行比对以允许间隙。我们还从Biomart下载该区域的一些基因注释数据。
afrom=2960000
ato=3160000
#bam file
alTrack <- AlignmentsTrack(system.file(package = "Gviz", "extdata", "gapped.bam"), isPaired = TRUE)
bmt <- BiomartGeneRegionTrack(genome = "hg19", chromosome = "chr12",
start = afrom, end = ato, filter = list(with_ox_refseq_mrna = TRUE),
stacking = "dense")
plotTracks(c(bmt, alTrack), from = afrom, to = ato, chromosome = "chr12")
现在,这已经向我们展示了轨道的总体布局:在顶部,我们有一个panel ,其中包含直方图形式的读取覆盖率信息,下面是单个读取的堆积视图。 只是采取一定的垂直空间用于展示比对结果,而不能显示整个测序的深度。 标题面板中的白色向下箭头表示这一事实。 我们可以通过使用max.height,min.height或stackHeight显示参数来解决这个问题,这些参数都控制堆叠读取的高度或垂直间距。 或者我们可以通过设置coverageHeight或minCoverageHeight参数来减小coverage部分的大小。
plotTracks(c(bmt, alTrack), from = afrom, to = ato,
chromosome = "chr12", min.height = 0, coverageHeight = 0.08,
minCoverageHeight = 0)
事实上读取堆叠不是特别有用,我们可以通过相应地设置类型显示参数来关闭它们。
plotTracks(c(alTrack, bmt), from = afrom, to = ato, chromosome = "chr12", type = "coverage")
让我们进一步放大以查看堆积部分的详细信息:
plotTracks(c(bmt, alTrack), from = afrom + 12700,
to = afrom + 15200, chromosome = "chr12")
箭头指示单个读取的方向,并且读取对通过亮灰色线连接。 对齐中的间隙通过连接的深灰色线显示。 在支持透明度的设备上,我们还可以看到一些读取对实际上是重叠的。
如前所述,我们可以通过在构造函数中设置isPaired参数来控制是将数据视为配对结束数据还是单端数据。以下是我们如何查看在单端模式下同一文件中的数据。
alTrack <- AlignmentsTrack(system.file(package = "Gviz",
"extdata", "gapped.bam"), isPaired = FALSE)
plotTracks(c(bmt, alTrack), from = afrom + 12700,
to = afrom + 15200, chromosome = "chr12")
DNAseq实验
为了更好地显示AlignmentsTrack的序列变体的特征,我们将加载不同的数据集,这次是从全基因组DNASeq SNP调用实验。参考基因组再次是hg19,并且使用Bowtie2进行了比对。
我们需要告诉AlignmentsTrack参考基因组(sequenceTrack)。
afrom <- 44945200
ato <- 44947200
alTrack <- AlignmentsTrack(system.file(package = "Gviz","extdata", "snps.bam"), isPaired = TRUE)
plotTracks(c(alTrack, sTrack), chromosome = "chr21", from = afrom,to = ato)
不匹配的碱基在堆积部分中的单独读数上以及在堆积直方图形式的覆盖图中指示。当放大到一个明显的杂合SNP位置时,我们可以揭示更多细节。
#Zoom
plotTracks(c(alTrack, sTrack), chromosome = "chr21", from = 44946590, to = 44946660)
#show individual letters
plotTracks(c(alTrack, sTrack), chromosome = "chr21",
from = 44946590, to = 44946660, cex = 0.5, min.height = 8)
突出
#highlight
ht <- HighlightTrack(trackList = list(atrack, grtrack,dtrack),
start = c(26705000, 26720000), width = 7000, chromosome = 7)
plotTracks(list(itrack, gtrack, ht), from = lim[1], to = lim[2])
叠加
对于某些应用,在绘图的同一区域上叠加多个轨道是有意义的。出于指导性示例的目的,我们将生成第二个DataTrack对象,并将其与第二章中的现有对象组合。
#create data
dat <- runif(100, min = -2, max = 22)
dtrack2 <- DataTrack(data = dat, start = coords[-length(coords)],
end = coords[-1], chromosome = chr, genome = gen,
name = "Uniform2", groups = factor("sample 2",levels = c("sample 1", "sample 2")),
legend = TRUE)
displayPars(dtrack) <- list(groups = factor("sample 1",levels = c("sample 1", "sample 2")), legend = TRUE)
ot <- OverlayTrack(trackList = list(dtrack2, dtrack))
ylims <- extendrange(range(c(values(dtrack), values(dtrack2))))
plotTracks(list(itrack, gtrack, ot), from = lim[1], to = lim[2], ylim = ylims, type = c("smooth", "p"))
displayPars(dtrack) <- list(alpha.title = 1, alpha = 0.5)
displayPars(dtrack2) <- list(alpha.title = 1, alpha = 0.5)
ot <- OverlayTrack(trackList = list(dtrack, dtrack2))
plotTracks(list(itrack, gtrack, ot), from = lim[1],
to = lim[2], ylim = ylims, type = c("hist"), window = 30)
Gviz 可视化基因组数索引