一次低区分度索引的查询优化 2021-12-06

2021-12-07  本文已影响0人  9_SooHyun

背景知识

1.索引

这里只罗列出InnoDB支持的索引:

这些索引的底层都是b+树,一个表有几个索引就有几棵b+树

索引总体划分为两类,主键索引也被称为聚簇索引(clustered index),其余都称呼为非主键索引也被称为二级索引(secondary index)

2.mysql explain说明

explain命令可以获得sql statement的执行计划,其输出的列说明如下:
EXPLAIN Output Columns

Column JSON Name Meaning
id select_id The SELECT identifier
select_type None The SELECT type
table table_name The table for the output row
partitions partitions The matching partitions
type access_type The join type
possible_keys possible_keys The possible indexes to choose
key key The index actually chosen
key_len key_length The length of the chosen key
ref ref The columns compared to the index, shows which columns or constants are compared to the index named in the key column to select rows from the table
rows rows Estimate of rows to be examined. 预估计的由type字段指明的搜索方式的探测总行数
filtered filtered Percentage of rows filtered by table condition. 表示通过查询条件获取的最终记录行数占通过type字段指明的搜索方式搜索出来的记录总行数的百分比
Extra None Additional information

首先,MySQL使用type扫描表,预计会得到rows条记录
其次,MySql会使用Extra额外的查询条件对这rows行记录做二次过滤
最终,得到符合查询语句的n条记录, filtered = n / rows

EXPLAIN Join Types

The type column of EXPLAIN output describes how tables are joined.
https://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain-join-types

The following list describes the join types, ordered from the best type to the worst

SELECT * FROM tbl_name WHERE primary_key=1;
SELECT * FROM ref_table,other_table
  WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table
  WHERE ref_table.key_column_part1=other_table.column
  AND ref_table.key_column_part2=1;

from mysql8.0 manual:

The index join type is the same as ALL, except that only the index tree is scanned. This occurs two ways:
indexALL相同,都是全量扫,但index只是扫描索引树

  1. 索引覆盖。If the index is a covering index for the queries and can be used to satisfy all data required from the table, only the index tree is scanned. In this case, the Extra column says Using index. An index-only scan(索引覆盖而无需回表的scan) usually is faster than ALL because the size of the index usually is smaller than the table data.
    注意,和ALL相比,它们都取得了全表的数据,但如果不是索引覆盖的情况,则index要先读索引再回表随机取数据,这时index就不会比ALL

  2. 按索引顺序查找数据行来执行全表扫描。A full table scan is performed using reads from the index to look up data rows in index order. Uses index does not appear in the Extra column.

MySQL can use this join type when the query uses only columns that are part of a single index.

EXPLAIN extra


案例

表说明:

三个表

目的:

查询每个resource_pool及其内部的host数量 SELECT resource_pool.*, count(host.id) count

原sql

查询耗时7s

SELECT resource_pool.*, count(host.id) count 
FROM `resource_pool` 
Inner JOIN biz_module 
  ON resource_pool.module = biz_module.path 
Left JOIN host 
  ON host.module_id = biz_module.id and host.is_deleted = 0 
// 因为host表存在IDX_host_is_deleted这个二值索引(二值所以区分度极低),而这里通过explain又会走IDX_host_is_deleted这个二值索引
// 那么就会产生大量的回表,也就是产生大量的随机io,效率极低。远不如直接扫主键的全表扫描快(因为主键是聚簇索引,扫主键是顺序io)
GROUP BY resource_pool.id 
ORDER BY resource_pool.name 
LIMIT 20

执行计划

id table type key ref rows filtered Extra
1 resource_pool index PRIMARY --- 11 100 Using temporary; Using filesort
1 biz_module ref IDX_biz_module_path om2.resource_pool.module 1 100 Using where; Using index
1 host ref IDX_host_is_deleted const 191385 100 Using where

优化:

通过 +0 把在is_deleted上的所有索引在本次查询中置为无效
另外,+0后还发现与host的left join使用了hash join的方式,进一步提升了查询效率

SELECT resource_pool.*, count(host.id) count 
FROM `resource_pool` 
Inner JOIN biz_module 
  ON resource_pool.module = biz_module.path 
Left JOIN host 
  ON host.module_id = biz_module.id and host.is_deleted +0 = 0 
GROUP BY resource_pool.id 
ORDER BY resource_pool.name 
LIMIT 20

or

SELECT resource_pool.*, count(host.id) count 
FROM `resource_pool` 
Inner JOIN biz_module 
  ON resource_pool.module = biz_module.path 
Left JOIN host 
  ignore index(IDX_host_is_deleted)  // 忽略IDX_host_is_deleted这个索引
  ON host.module_id = biz_module.id and host.is_deleted = 0 
GROUP BY resource_pool.id 
ORDER BY resource_pool.name 
LIMIT 20

执行计划

id table type key ref rows filtered Extra
1 resource_pool ALL 11 100 Using temporary; Using filesort
1 biz_module ref IDX_biz_module_path om2.resource_pool.module 1 100 Using where; Using index
1 host ALL 382770 100 Using where; Using join buffer (hash join)

或把count放到server层统计:

SELECT resource_pool.*, count(case when host.is_deleted = 0 then 1 end) count 
FROM `resource_pool` 
Inner JOIN biz_module 
  ON resource_pool.module = biz_module.path 
Left JOIN host 
  ON host.module_id = biz_module.id 
GROUP BY resource_pool.id 
ORDER BY resource_pool.name 
LIMIT 20

这两种写法的执行计划是相同的

原sql Left JOIN的条件下放探究

to be continue

SELECT resource_pool.*, count(host.id) count 
FROM `resource_pool` 
Inner JOIN biz_module 
  ON resource_pool.module = biz_module.path 
Left JOIN host 
  ON host.module_id = biz_module.id 
WHERE host.is_deleted = 0 
GROUP BY resource_pool.id 
ORDER BY resource_pool.name 
LIMIT 20
上一篇 下一篇

猜你喜欢

热点阅读