索引

2018-10-29 本文已影响0人 Kevifunau

定义

An index is a table/file of (keyVal 索引字段,tupleID 行指针) pairs
索引是定义在存储表(Table)基础之上，有助于无需检查所有记录而快速定
位所需记录的一种辅助存储结构，由一系列存储在磁盘上的索引项(index entries)组成，每一索引项又由两部分构成:

索引字段:由Table中某些列(通常是一列)中的值串接而成。索引中通常存储了索引字段的每一个值(也有不是这样的)。索引字段类似于词典中的词条。

行指针:指向Table中包含索引字段值的记录在磁盘上的存储位置。行指针类似于词条在书籍、词典中出现的页码。

image.png

类别

基于构建索引的属性（是否为 unique）

primary 主索引
index on unique field, may be sorted on A（主码）
索引构建在 unique 的 attribute上，主文件在attribute 上有序存储。
clustering 聚簇索引
index on non-unique field, file sorted on A（排序码）
索引构建在 non-unique的 attribute上，主文件在attribute 上有序存储，行指针指向每个value 的第一次出现位置。
secondary 多级索引/ 辅助索引
辅助索引不能组织文件
file not sorted on A

基于索引的结构 (索引与主文件的关系)

dense 稠密索引
every tuple is referenced by an entry in the index file
索引项与主文件记录一一对应
sparse
only some tuples are referenced by index file entries
索引项与部分主文件一一对应
single-level
tuples are accessed directly from the index file
multi-level
may need to access several index pages to reach tuple

Primary index 主索引的 selection , insertion ,deletion

主索引也是一种 dense 索引。
i : index pages

通过index找到数据地址 + 读 + 或读溢出块
$Cost_{select}$ = log(i,2) + 1r + Ov

查询速度块，复杂度 O（log n）。

通过index找到数据地址 + 更新索引 + 读+ 或读溢出块 + 写 + 或写溢出块
$Cost_{insertion}$ = log(i,2) + ( i / 2 ) * (1r + 1w) + (1 + Ov)r + (1+δ)w

*插入需要重新组织文件，复杂度O（n） *

$Cost_{deletion}$ ：

if we delete index entries by marking ...
找到该数据 + 写disk + 更新索引O（1）
$Cost_{delete}$ = (log2 i + 1 + Ov)r + 2w
If we delete index entry by index file reorganisation
找到该数据 + 更新索引O（n） + 写disk
$Cost_{delete}$ = (log2 i + 1 + Ov)r + i/2.(1r+1w) + 1w

删除可以是O（log n ），也可以是O（n）,取决于是否重新组织文件

Clustering Index 聚簇索引

索引构建在 non-unique 的属性上，索引中临近的记录在主文件中也是临近的。
一般是稀疏索引。

cluster index

适用于：

range queries on A (find lower bound, then scan data)
pmr queries on A (search index for specified value）

插入： expensive ! 需要重新组织 index file and data file
删除：类似 primary index

Secondary index 辅助索引

辅助索引不能组织主文件数据，所以主索引下面引入一个指针桶 secondary index file。
sparse index (Ix1) on dense index containing (key,offset) pairs
dense index (Ix2) containing just TupleId's

image.png

Insertion 插入:
每次插入，索引和主文件都要同步更新，只需要重新组织索引文件
Deletion 删除:
使用mark-style 的方式删除记录，周期性的重新组织index 文件。

B-Trees

Multi-level Index 多级索引:当索引项比较多时，可以对索引再建立索引，依此类推，形成多级索引。
B树索引：一种以树型数据结构来组织索引项的多级索引.
B 树最底层（叶子节点）与 data file 是一一对应的 dense index .

depth=3, n=3

B树的插入

find leaf node and position in node where entry would be stored
if node is not full, insert entry into appropriate spot
if node is full, split node into two half-full nodes and promote middle element to parent
if parent full, split and promote

image.png

插入的复杂度
$Cost_{insertion}$ = $Cost_{treesearch} + Cost_{treeinsert} + Cost_{datainsert}$

查找复杂度：
一次找到 data
$Cost_{one-selection} = (D + 1)r$

search index to find leaf node for Lo
// 读 每一个 index ,在读每个data 
for each leaf node entry until Hi found {
    access tuple t using TupleId from entry
}

$Cost_{range-selection} = Dr + b_i + b_q$

索引

定义

image.png

类别

Primary index 主索引的 selection , insertion ,deletion

Clustering Index 聚簇索引

Secondary index 辅助索引

B-Trees

猜你喜欢

热点阅读