文献阅读3.6 VISTA：可视化任意长度的全局 DNA 序列比

2022-08-15 本文已影响0人龙star180

期刊

Bioinformatics （6.931/Q1）

VISTA : visualizing global DNA sequence alignments of arbitrary length （2000）

VISTA：可视化任意长度的全局 DNA 序列比对

Abstract

Summary: VISTA is a program for visualizing global DNA sequence alignments of arbitrary length. It has a clean output, allowing for easy identification of similarity, and is easily configurable, enabling the visualization of alignments of various lengths at different levels of resolution. It is currently available on the web, thus allowing for easy access by all researchers.

摘要：VISTA 是一个可视化任意长度的全局 DNA 序列比对的程序。它具有清晰的输出，可以轻松识别相似性，并且易于配置，可以在不同的分辨率级别上显示各种长度的对齐。它目前在网络上提供，因此所有研究人员都可以轻松访问。

Availability: VISTA server is available on the web at http: //www-gsd.lbl.gov/vista. The source code is available upon request.

VISTA 服务器可在网站 http://www-gsd.lbl.gov/vista 上获得。源代码可根据要求提供。

Contact: vista@lbl.gov

Motivation

Alignment of genomic sequence from different organisms is becoming an increasingly powerful method in biology, and is being used for many purposes. Comparative sequence analysis has enabled identification of regulatory non-coding regions, and location of coding exons using purely computational means. Visual front-ends are necessary to make the process of viewing alignments intuitive and easy and to facilitate discovery of conserved sequences for functionally significant regions.

来自不同生物的基因组序列的比对正在成为生物学中越来越强大的方法，并且被用于许多目的。 比较序列分析能够使用纯粹的计算手段识别调节性非编码区和编码外显子的位置。 视觉前端是必要的，以使查看比对的过程直观和容易，并有助于发现功能重要区域的保守序列。

For short alignments, dot plots have proven to be very useful, allowing for efficient visualization of repeats, rearrangements, and conservation. At the same time dot plots are less adept at displaying longer alignments, when the length of the sequences becomes larger than the resolution of the displaying medium. The Alignment Service package was designed to align long sequences and to visualize resulting multiple alignments. This tool was applied to the annotation of open reading frames in viral genomes. Unfortunately this tool does not have a web interface and is hard to obtain. The earlier study shows visual presentation of sequence variability along the alignment as a graph. For longer alignments, the only widely available visualization tool is PIPMaker. PIPMaker generates a highly detailed plot of a local alignment as a series of dots and dashes representing the levels of conservation between the base sequence and clones from the second sequence.

对于短比对，点图已被证明非常有用，可以有效地显示重复、重排和保守。同时，当序列的长度变得大于显示介质的分辨率时，点图不太擅长显示较长的比对。 The Alignment Service package 旨在对齐长序列并可视化生成的多个对齐。该工具被应用于病毒基因组中开放阅读框的注释。不幸的是，这个工具没有网络界面，并且很难获得。 较早的研究将序列变异性沿比对显示为图形的视觉呈现。 对于更长的对齐，唯一广泛使用的可视化工具是 PIPMaker。 PIPMaker 生成高度详细的局部比对图，以一系列点和虚线表示基本序列与第二个序列的克隆之间的保守水平。

Nevertheless, none of the currently available visualization methods combine the following critical features: (1) a clear, configurable output; (2) the ability to visualize several global alignments on the same scale; (3) the use of a continuous curve to represent the level of identity; (4) the ability to visualize alignments of up to several megabases; (5) effective handling of gaps in the alignment; (6) available source code. The VISTA program contains all of the aforementioned features.

然而，目前可用的可视化方法都没有结合以下关键特征：（1）清晰、可配置的输出； (2) 能够以相同的比例可视化多个全局对齐； (3) 使用连续曲线来表示身份等级； (4) 可视化多达几兆碱基的比对的能力； (5) 有效处理对中间隙； (6) 可用的源代码。 VISTA 程序包含上述所有功能。

Features

The VISTA plot (Figure 1) is based on moving a user-specified window over the entire alignment and calculating the percent identity over the window at each base pair. The x-axis represents the base sequence; the y-axis represents the percent identity. If the user supplies an annotation file (see below), genes and exons are marked above the plot. The direction of genes is indicated by an arrow, while the coding exons and UTRs are marked with rectangles of different color. Conserved regions (defined below) are highlighted under the curve, with red indicating a conserved non-coding region and blue indicating a conserved exon. Conserved UTRs are colored turquoise. The colors can be modified by the user.

VISTA 图（图 1）基于在整个对齐上移动用户指定的窗口并计算每个碱基对窗口上的同一性百分比。 x轴代表碱基序列； y 轴表示同一性百分比。如果用户提供注释文件（见下文），则基因和外显子会在图上方标记。 基因的方向用箭头表示，而编码外显子和UTR用不同颜色的矩形标记。 保守区域（定义如下）在曲线下突出显示，红色表示保守的非编码区域，蓝色表示保守的外显子。保守的 UTR 是绿松石色。 颜色可由用户修改。

Figure 1

Fig. 1. Two sample VISTAs. Part (a) shows a two-way comparison between human and mouse sequences. Part (b) shows a fragment of a three-way comparison with human, mouse, and rabbit sequences.

图 1. 两个 VISTA 示例。 (a) 部分显示了人和小鼠序列之间的双向比较。 (b) 部分显示了与人类、小鼠和兔子序列进行三向比较的片段。

A conserved region is defined with percentage and length cutoffs. Conserved segments with percent identity x and length y are defined to be regions in which every contiguous subsegment of length y was at least x% identical to its paired sequence. These segments are merged to define the conserved regions.

保守区域用百分比和长度截止值定义。 具有百分比同一性 x 和长度 y 的保守片段被定义为其中长度为 y 的每个连续子片段与其配对序列至少 x% 相同的区域。 这些片段被合并以定义保守区域。

VISTA can be configured for visualizing alignments of various lengths by changing several parameters: the number of pages on which the output appears, the number of frames per page, the window size, and the resolution at which the alignment is plotted. VISTA allows one to easily create figures for various documents. For simplicity it is also possible to specify only a subset of these parameters, with the rest being automatically calculated. VISTA also supports simultaneous visualization of several related alignments (Figure 1b). This is particularly useful for long genomic sequences for which easy-to-use tools for multiple alignments and its graphic depiction are absent at the present time.

VISTA 可以通过更改几个参数来配置以可视化各种长度的对齐：输出出现的页数、每页的帧数、窗口大小和绘制对齐的分辨率。 VISTA 允许人们轻松地为各种文档创建图形。为简单起见，也可以只指定这些参数的一个子集，其余的将自动计算。 VISTA 还支持几个相关对齐的同时可视化（图 1b）。这对于长基因组序列特别有用，目前尚无易于使用的多重比对工具及其图形描述。

VISTA is implemented as an automatic server located at http://www-gsd.lbl.gov/vista web site. The input to VISTA consists of three parts. (1) One or more global alignments in one of the standard formats. If unaligned sequences are provided, they are aligned using the GLASS alignment program, which is specifically designed for the global alignment of large genomic regions. (2) An annotation file for the base sequence, in either the Sanger Centre’s GFF format (http://www.sanger.ac.uk/Software/formats/GFF/), or another, simpler format described on our web page. (3) Any of the parameters (such as the coloring criteria or length of output) described previously. A more complete manual for using VISTA is available on the web site.

VISTA 实现为位于 http://www-gsd.lbl.gov/vista 网站的自动服务器。 VISTA 的输入由三部分组成。 (1) 一种或多种标准格式的全局比对。如果提供了未比对的序列，则使用 GLASS 比对程序进行比对，该程序专为大基因组区域的全局比对而设计。 (2) 碱基序列的注释文件，采用 Sanger 中心的 GFF 格式 (http://www.sanger.ac.uk/Software/formats/GFF/) 或我们网页上描述的另一种更简单的格式。 (3) 前面描述的任何参数（例如着色标准或输出长度）。网站上提供了更完整的 VISTA 使用手册。

A VISTA plot is created by a Java program which outputs PDF files using the retepPDF (http://www.retep.org.uk/pdf/) library. VISTA uses time and memory linear in the length of the input sequences.

VISTA 图由 Java 程序创建，该程序使用 retepPDF (http://www.retep.org.uk/pdf/) 库输出 PDF 文件。 VISTA 使用与输入序列长度成线性关系的时间和内存。

文献阅读3.6 VISTA：可视化任意长度的全局 DNA 序列比

猜你喜欢

热点阅读