Innodb-AHI
作用
- AHI是针对叶子节点的,作用是减少B树寻址的代价。
如何做的
通过key(index_id+fileds+bytes):value(记录的物理地址)来直接定位。
这里面有几个细节
- fileds+bytes如何选择
-
key(index_id+fileds+bytes)能唯一确定一条记录,如果存在重复的情况则选择最左或最右的记录。
如下图,假设fileds为2,bytes为0,则需要创建(a,a),(a,b),(a,c),(a,d)的key(不考虑跨页的情况)。
存在重复的情况,当left_side为True时,key(a,b)的value选择(a,b,1)这条记录,否则key(a,b)的value选择(a,b,3)这条记录.
image.png
如何使用
整体流程
在btr_cur_search_to_nth_level函数里
- 如果满足一定条件,就进入btr_search_guess_on_hash来定位,再通过btr_search_check_guess来判断记录的有效性。
- 如果无法通过AHI定位或者定位不成功,search_loop逐层查找。
- 当完成了搜索之后,如果最终定位的层是叶子节点,会调用btr_search_info_update更新AHI相关的信息。
代码
btr_cur_search_to_nth_level(
dict_index_t* index, /*!< in: index */
ulint level, /*!< in: the tree level of search */
ulint mode, /*!< in: PAGE_CUR_L, ...;
Inserts should always be made using
PAGE_CUR_LE to search the position! */
...)
{
/* Use of AHI is disabled for intrinsic table as these tables re-use
the index-id and AHI validation is based on index-id. */
if (rw_lock_get_writer(btr_get_search_latch(index))
== RW_LOCK_NOT_LOCKED
&& latch_mode <= BTR_MODIFY_LEAF
&& info->last_hash_succ
&& !index->disable_ahi
&& !estimate
# ifdef PAGE_CUR_LE_OR_EXTENDS
&& mode != PAGE_CUR_LE_OR_EXTENDS
# endif /* PAGE_CUR_LE_OR_EXTENDS */
&& !dict_index_is_spatial(index)
/* If !has_search_latch, we do a dirty read of
btr_search_enabled below, and btr_search_guess_on_hash()
will have to check it again. */
&& UNIV_LIKELY(btr_search_enabled)
&& !modify_external
&& btr_search_guess_on_hash(index, info, tuple, mode,
latch_mode, cursor,
has_search_latch, mtr)) {
/* Search using the hash index succeeded */
ut_ad(cursor->up_match != ULINT_UNDEFINED
|| mode != PAGE_CUR_GE);
ut_ad(cursor->up_match != ULINT_UNDEFINED
|| mode != PAGE_CUR_LE);
ut_ad(cursor->low_match != ULINT_UNDEFINED
|| mode != PAGE_CUR_LE);
btr_cur_n_sea++;
DBUG_VOID_RETURN;
}
// 初始的,获得索引的根节点(space_id,page_no)
space = dict_index_get_space(index);
page_no = dict_index_get_page(index);
search_loop:
// 循环、逐层的查找,直至达到传入的层数「level」,一般是0(即叶子节点)
// 此处的分析忽略Change Buffer的部分
// 从Buffer Pool或磁盘中得到索引页
block = buf_page_get_gen(space, zip_size, page_no, rw_latch, guess, buf_mode,
file, line, mtr);
// 在索引页中中查找对于指定的Tuple,满足某种条件(依赖于传入的 mode,例如 PAGE_CUR_L)
// 的 record 将查找结果保存在page_cursor中,page_cursor结构也很简单:
// struct page_cur_t{
// byte* rec; /*!< pointer to a record on page */
// buf_block_t* block; /*!< pointer to the block containing rec */
// };
page_cur_search_with_match(block, index, tuple, page_mode, &up_match, &up_bytes,
&low_match, &low_bytes, page_cursor);
if (level != height) {
// 如果没到达指定层数,获得page_cursor(中间节点)内保存的下层节点的索引页page_no
//注意:中间节点的Value是一个Pointer(page_no),指向子节点(中间节点或叶子节点)
node_ptr = page_cur_get_rec(page_cursor);
/* Go to the child node */
page_no = btr_node_ptr_get_child_page_no(node_ptr, offsets);
// 在下一层继续查找
goto search_loop;
}
// 达到指定层数,函数退出
if (btr_search_enabled && !index->disable_ahi) {
btr_search_info_update(index, cursor);
}
}
定位与判断有效性
定位-btr_search_guess_on_hash
- 首先用户提供的前缀索引查询条件必须大于等于构建AHI时的前缀索引列数,这里存在一种可能性:索引上的search_info的n_fields 和block上构建AHI时的cur_n_fields值已经不相同了,但是我们并不知道本次查询到底落在哪个block上,这里一致以search_info上的n_fields为准来计算fold,去查询AHI;
- 在检索AHI时需要加&btr_search_latch的S锁;
- 如果本次无法命中AHI,就会将btr_search_info::last_hash_succ设置为false,这意味着随后的查询都不会去使用AHI了,只能等待下一路查询信息分析后才可能再次启动(btr_search_failure);
- 对于从ahi中获得的记录指针,还需要根据当前的查询模式检查是否是正确的记录位置(btr_search_check_guess)。
判断记录有效性btr_search_check_guess
判断记录的有效性跟查询模式很相关,细节看注释。
btr_search_check_guess(
btr_cur_t* cursor,
ibool can_only_compare_to_cursor_rec,
const dtuple_t* tuple,
ulint mode,
mtr_t* mtr)
{
rec_t* rec;
ulint n_unique;
ulint match;
int cmp;
mem_heap_t* heap = NULL;
ulint offsets_[REC_OFFS_NORMAL_SIZE];
ulint* offsets = offsets_;
ibool success = FALSE;
rec_offs_init(offsets_);
n_unique = dict_index_get_n_unique_in_tree(cursor->index);
rec = btr_cur_get_rec(cursor);
ut_ad(page_rec_is_user_rec(rec));
match = 0;
offsets = rec_get_offsets(rec, cursor->index, offsets,
n_unique, &heap);
cmp = cmp_dtuple_rec_with_match(tuple, rec, offsets, &match);
if (mode == PAGE_CUR_GE) {
//cmp>0,说明tuple大于rec,rec可能是重复情况下的最左记录,比如AHI的key为(a,b),tuple为(a,b,c),rec为(a,b,a),那么这种情况就不行。
if (cmp > 0) {
goto exit_func;
}
cursor->up_match = match;
if (match >= n_unique) {
success = TRUE;
goto exit_func;
}
} else if (mode == PAGE_CUR_LE) {
if (cmp < 0) {
goto exit_func;
}
cursor->low_match = match;
} else if (mode == PAGE_CUR_G) {
if (cmp >= 0) {
goto exit_func;
}
} else if (mode == PAGE_CUR_L) {
if (cmp <= 0) {
goto exit_func;
}
}
if (can_only_compare_to_cursor_rec) {
/* Since we could not determine if our guess is right just by
looking at the record under the cursor, return FALSE */
goto exit_func;
}
match = 0;
//还需要进一步判断记录是否有效,当mode为PAGE_CUR_G和PAGE_CUR_GE时,判断记录是否为满足AHI查询key的最左记录。否则判断记录是否为满足AHI查询key的最右记录。
if ((mode == PAGE_CUR_G) || (mode == PAGE_CUR_GE)) {
rec_t* prev_rec;
ut_ad(!page_rec_is_infimum(rec));
prev_rec = page_rec_get_prev(rec);
if (page_rec_is_infimum(prev_rec)) {
success = btr_page_get_prev(page_align(prev_rec), mtr)
== FIL_NULL;
goto exit_func;
}
offsets = rec_get_offsets(prev_rec, cursor->index, offsets,
n_unique, &heap);
cmp = cmp_dtuple_rec_with_match(
tuple, prev_rec, offsets, &match);
if (mode == PAGE_CUR_GE) {
success = cmp > 0;
} else {
success = cmp >= 0;
}
goto exit_func;
} else {
rec_t* next_rec;
ut_ad(!page_rec_is_supremum(rec));
next_rec = page_rec_get_next(rec);
if (page_rec_is_supremum(next_rec)) {
if (btr_page_get_next(page_align(next_rec), mtr)
== FIL_NULL) {
cursor->up_match = 0;
success = TRUE;
}
goto exit_func;
}
offsets = rec_get_offsets(next_rec, cursor->index, offsets,
n_unique, &heap);
cmp = cmp_dtuple_rec_with_match(
tuple, next_rec, offsets, &match);
if (mode == PAGE_CUR_LE) {
success = cmp < 0;
cursor->up_match = match;
} else {
success = cmp <= 0;
}
}
exit_func:
if (UNIV_LIKELY_NULL(heap)) {
mem_heap_free(heap);
}
return(success);
}
如何构建AHI
fileds+bytes
上文说了key是由index_id+fileds+bytes构成,那么如何确定fileds和bytes。
fileds和bytes是什么概念
参看https://www.jianshu.com/p/0cdd573a8232
用于确定fileds和bytes的结构体字段
总共有3个结构体在确定fileds和bytes发挥作用,分别是btr_cur_t(树查询时的游标)、btr_search_t(为每个索引维护的查询信息)、buf_block_t(block控制结构体)。btr_cur_t中的信息在B树定位中更新,在B树定位后,btr_search_t根据btr_cur_t的信息更新,用于记录B树查询相关的信息,然后buf_block_t根据btr_search_t的信息更新,用于记录本Block相关的查询信息。
为每个索引对象维护的index->search_info,类型为btr_search_t。
/** The search info struct in an index */
struct btr_search_t{
...
ulint n_fields; /*!< recommended prefix length for hash search:
number of full fields */
ulint n_fields; /*!< recommended prefix: number of bytes in
an incomplete field
@see BTR_PAGE_MAX_REC_SIZE */
ibool left_side; /*!< TRUE or FALSE, depending on whether
the leftmost record of several records with
the same prefix should be indexed in the
hash index */
...
};
block控制结构体上相关变量(buf_block_t)
struct buf_block_t{
...
volatile ulint n_bytes; /*!< recommended prefix length for hash
search: number of bytes in
an incomplete last field */
volatile ulint n_fields; /*!< recommended prefix length for hash
search: number of full fields */
volatile bool left_side; /*!< true or false, depending on
whether the leftmost record of several
records with the same prefix should be
indexed in the hash index */
...
}
The tree cursor
struct btr_cur_t {
...
ulint up_match; /*!< If the search mode was PAGE_CUR_LE,
the number of matched fields to the
the first user record to the right of
the cursor record after
btr_cur_search_to_nth_level;
for the mode PAGE_CUR_GE, the matched
fields to the first user record AT THE
CURSOR or to the right of it;
NOTE that the up_match and low_match
values may exceed the correct values
for comparison to the adjacent user
record if that record is on a
different leaf page! (See the note in
row_ins_duplicate_error_in_clust.) */
ulint up_bytes; /*!< number of matched bytes to the
right at the time cursor positioned;
only used internally in searches: not
defined after the search */
ulint low_match; /*!< if search mode was PAGE_CUR_LE,
the number of matched fields to the
first user record AT THE CURSOR or
to the left of it after
btr_cur_search_to_nth_level;
NOT defined for PAGE_CUR_GE or any
other search modes; see also the NOTE
in up_match! */
ulint low_bytes; /*!< number of matched bytes to the
left at the time cursor positioned;
only used internally in searches: not
defined after the search */
ulint n_fields; /*!< prefix length used in a hash
search if hash_node != NULL */
ulint n_bytes; /*!< hash prefix bytes if hash_node !=
NULL */
...
};
确定fileds与bytes的时机
参考整体流程,当完成了搜索之后,如果最终定位的层是叶子节点,会调用btr_search_info_update更新AHI相关的信息。
这个时候cursor->{up_match, up_bytes, low_match, low_bytes}都已经确定。
首先需要根据cursor->{up_match, up_bytes, low_match, low_bytes}来更新index的search info。
路径为btr_search_info_update->btr_search_info_update_slow->btr_search_info_update_hash。
有两种情况需要更新btr_search_t->{n_fields,n_bytes,left_side}。
- btr_search_t->n_hash_potential为0:search info首次初始化或者上次查询根据查询条件无法唯一确定一条记录。
- 如代码所示,如果cmp<=0,说明cursor->low_match, cursor->low_bytes所在的记录是在info->n_fields, info->n_bytes这个范围内与查询条件相等的最右边的记录,如果info的建议是按照相同前缀最左记录构建AHI,说明已不符合当次查询要求,需要重新生成建议。(补个图吧)
cmp = ut_pair_cmp(info->n_fields, info->n_bytes,
cursor->low_match, cursor->low_bytes);
if (info->left_side ? cmp <= 0 : cmp > 0) {
goto set_new_recomm;
}
生成info->{n_fields,n_bytes,left_side}新值是如下算法,由以下算法可以看出,选择{info->n_fields, info->n_bytes, info->left_side}的依据则是在不超过 unique index 列数的前提下,使其计算代价最小,而 index->info->left_side 的值则会决定存储同一数据页上相同前缀索引的最左记录还是最右记录。
细节说明看注释
set_new_recomm:
/* We have to set a new recommendation; skip the hash analysis
for a while to avoid unnecessary CPU time usage when there is no
chance for success */
info->hash_analysis = 0;
cmp = ut_pair_cmp(cursor->up_match, cursor->up_bytes,
cursor->low_match, cursor->low_bytes);
if (cmp == 0) {
//cmp==0说明根据查询条件无法唯一确定一条记录,比如根据=b查询,然后定位low定位到a,up定位到c。
info->n_hash_potential = 0;
/* For extra safety, we set some sensible values here */
info->n_fields = 1;
info->n_bytes = 0;
info->left_side = TRUE;
} else if (cmp > 0) {
//cm
info->n_hash_potential = 1;
if (cursor->up_match >= n_unique) {
//n_unique个fileds已经能唯一确定一条记录了
info->n_fields = n_unique;
info->n_bytes = 0;
} else if (cursor->low_match < cursor->up_match) {
//+1怎么理解,比如low_match=1,up_match=3,这个时候把n_fields设置为2已经足以定位到up_match上了,比如查询条件是(a,b,c),low_match为(a), up_match为(a,b,c)这个时候使用(a,b已经足以定位到up_match)。另外low_match=0时,n_fields设置为1,也足以满足情况了。
info->n_fields = cursor->low_match + 1;
info->n_bytes = 0;
} else {
info->n_fields = cursor->low_match;
info->n_bytes = cursor->low_bytes + 1;
}
info->left_side = TRUE;
} else {
info->n_hash_potential = 1;
if (cursor->low_match >= n_unique) {
info->n_fields = n_unique;
info->n_bytes = 0;
} else if (cursor->low_match > cursor->up_match) {
info->n_fields = cursor->up_match + 1;
info->n_bytes = 0;
} else {
info->n_fields = cursor->up_match;
info->n_bytes = cursor->up_bytes + 1;
}
info->left_side = FALSE;
}
static
void
btr_search_info_update_hash(
btr_search_t* info,
btr_cur_t* cursor)
{
dict_index_t* index = cursor->index;
ulint n_unique;
int cmp;
ut_ad(!rw_lock_own(btr_get_search_latch(index), RW_LOCK_S));
ut_ad(!rw_lock_own(btr_get_search_latch(index), RW_LOCK_X));
if (dict_index_is_ibuf(index)) {
/* So many deletes are performed on an insert buffer tree
that we do not consider a hash index useful on it: */
return;
}
n_unique = dict_index_get_n_unique_in_tree(index);
if (info->n_hash_potential == 0) {
goto set_new_recomm;
}
/* Test if the search would have succeeded using the recommended
hash prefix */
if (info->n_fields >= n_unique && cursor->up_match >= n_unique) {
increment_potential:
info->n_hash_potential++;
return;
}
cmp = ut_pair_cmp(info->n_fields, info->n_bytes,
cursor->low_match, cursor->low_bytes);
if (info->left_side ? cmp <= 0 : cmp > 0) {
goto set_new_recomm;
}
cmp = ut_pair_cmp(info->n_fields, info->n_bytes,
cursor->up_match, cursor->up_bytes);
if (info->left_side ? cmp <= 0 : cmp > 0) {
goto increment_potential;
}
set_new_recomm:
/* We have to set a new recommendation; skip the hash analysis
for a while to avoid unnecessary CPU time usage when there is no
chance for success */
info->hash_analysis = 0;
cmp = ut_pair_cmp(cursor->up_match, cursor->up_bytes,
cursor->low_match, cursor->low_bytes);
if (cmp == 0) {
//cmp==0说明根据查询条件无法唯一确定一条记录,比如根据=b查询,然后定位low定位到a,up定位到c。
info->n_hash_potential = 0;
/* For extra safety, we set some sensible values here */
info->n_fields = 1;
info->n_bytes = 0;
info->left_side = TRUE;
} else if (cmp > 0) {
info->n_hash_potential = 1;
if (cursor->up_match >= n_unique) {
info->n_fields = n_unique;
info->n_bytes = 0;
} else if (cursor->low_match < cursor->up_match) {
info->n_fields = cursor->low_match + 1;
info->n_bytes = 0;
} else {
info->n_fields = cursor->low_match;
info->n_bytes = cursor->low_bytes + 1;
}
info->left_side = TRUE;
} else {
info->n_hash_potential = 1;
if (cursor->low_match >= n_unique) {
info->n_fields = n_unique;
info->n_bytes = 0;
} else if (cursor->low_match > cursor->up_match) {
info->n_fields = cursor->up_match + 1;
info->n_bytes = 0;
} else {
info->n_fields = cursor->up_match;
info->n_bytes = cursor->up_bytes + 1;
}
info->left_side = FALSE;
}
}
完成Index层面的n_fileds和n_bytes建议后,如何落实到block层面。
代码路径为btr_search_info_update->btr_search_info_update_slow->btr_search_update_block_hash_info。
因为AHI虽然是针对Index产生建议,但是最终是在block上建立key:value的映射关系,block层面的记录的是对block的查询信息,如果满足一定条件,就建立AHI。
有个疑问?在一个block构建完成后,如果index建议的fields和bytes发生变化,innodb是什么行为。
如何避免频繁构建AHI
- Index层面btr_search_t,如何避免频繁生成新的建议
- block层面buf_block_t,如何判断该block是否值得构建
先说index层面的btr_search_t,btr_search_t有个变量hash_analysis,当生成新的建议后hash_analysis被重置为0,重置后对该索引BTR_SEARCH_HASH_ANALYSIS次查询内,都不会尝试生成新的建议了。
再说block层面的,参考https://juejin.cn/post/6844903536765976590
AHI并发控制
https://juejin.cn/post/6844903536765976590
https://developer.aliyun.com/article/41046
http://mysql.taobao.org/monthly/2015/09/01/
https://www.jianshu.com/p/0cdd573a8232