Vertica的这些事(十六)——Vertica如何建表
大家看到题目可能会想,建表谁不会呀,还用讲嘛?但是如何建表能使你的应用查询更加快呢?
Anatomy of a Projection
The [CREATE PROJECTION]statement defines the individual elements of a projection, as the following graphic shows.
image.pngThe previous example contains the following significant elements:
Column List and Encoding
Lists every column in the projection and defines the encoding for each column. Unlike traditional database architectures, HP Vertica operates on encoded data representations. Therefore, HP recommends that you use data encoding because it results in less disk I/O.
Base Query
Identifies all the columns to incorporate in the projection through column name and table name references. The base query for large table projections can contain PK/FK joins to smaller tables.
Sort Order
The sort order optimizes for a specific query or commonalities in a class of queries based on the query predicate. The best sort orders are determined by the WHERE clauses. For example, if a projection's sort order is (x, y), and the query's WHERE clause specifies (x=1 AND y=2), all of the needed data is found together in the sort order, so the query runs almost instantaneously.
You can also optimize a query by matching the projection's sort order to the query's GROUP BY clause. If you do not specify a sort order, HP Vertica uses the order in which columns are specified in the column definition as the projection's sort order.
The ORDER BY clause specifies a projection's sort order, which localizes logically grouped values so that a disk read can pick up many results at once. For maximum performance, do not sort projections on LONG VARBINARY and LONG VARCHAR columns.Segmentation
The segmentation clause determines whether a projection is segmented across nodes within the database. Segmentation distributes contiguous pieces of projections, calledsegments, for large and medium tables across database nodes. Segmentation maximizes database performance by distributing the load. Use SEGMENTED BY HASH to segment large table projections.
For small tables, use the UNSEGMENTED keyword to direct HP Vertica to replicate these tables, rather than segment them. Replication creates and stores identical copies of projections for small tables across all nodes in the cluster. Replication ensures high availability and recovery.
For maximum performance, do not segment projections on LONG VARBINARY and LONG VARCHAR columns.
以上来自官网,理解如下:
Projection的解析
Sort Order
1、 order by 后表中插入的数据是有序的,所以order by 的列就源自于你在查询语句时使用的where 字句的内容。例如,如果字句查询中有where x=1 and y=2,那么建立projection时order by (x, y)查询的时候就会迅速定位到符合条件的数据
2、 group by 后面的字段,出现在order by 中也可以优化查询。
3、 order by 不要建立在LONG VARBINARY and LONG VARCHAR的列
Segmentation
1、 Segmentation by hash()就是按照某一列,打散数据,把数据均匀的分布在各个节点上,对于大表,要记得使用。所以 hash里的列是主键最好,也就是说该列数据不重复的值越多,越适合做hash.
2、 Segmentation by 的列不要用LONG VARBINARY and LONG VARCHAR columns.
根据以上规则建表,可以更大程度优化你的查询性能。