MySQL

56-MySQL索引优化与查询优化-JOIN

2022-11-02  本文已影响0人  紫荆秋雪_文

一、关联查询优化

1.1、数据准备

CREATE TABLE IF NOT EXISTS `type`
(
    `id`   INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `card` INT(10) UNSIGNED NOT NULL,
    PRIMARY KEY (`id`)
);

INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO type(card)
VALUES (FLOOR(1 + (RAND() * 20)));
CREATE TABLE IF NOT EXISTS `book`
(
    `bookid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `card`   INT(10) UNSIGNED NOT NULL,
    PRIMARY KEY (`bookid`)
);
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));

1.2、左外连接

1.2.1、当没有索引时

EXPLAIN
SELECT SQL_NO_CACHE *
FROM `type`
         LEFT JOIN book ON type.card = book.card;

1.2.2、为被驱动表添加索引

CREATE INDEX idx_book_card ON book (card);
EXPLAIN
SELECT SQL_NO_CACHE *
FROM `type`
         LEFT JOIN book ON type.card = book.card;

1.2.2、为驱动表添加索引

CREATE INDEX idx_type_card ON type (card);
EXPLAIN
SELECT SQL_NO_CACHE *
FROM `type`
         LEFT JOIN book ON type.card = book.card;

1.2.3、删除被驱动表索引

DROP INDEX idx_book_card ON book;
EXPLAIN
SELECT SQL_NO_CACHE *
FROM `type`
         LEFT JOIN book ON type.card = book.card;

1.3、采用内连接

1.3.1、采用内连接没有索引

DROP INDEX idx_type_card ON type;
DROP INDEX idx_book_card ON book;
EXPLAIN
SELECT SQL_NO_CACHE *
FROM type
         INNER JOIN book ON type.card = book.card;

1.3.2、为被驱动表 book添加索引

CREATE INDEX idx_book_card ON book (card);
EXPLAIN
SELECT SQL_NO_CACHE *
FROM type
         INNER JOIN book ON type.card = book.card;

1.3.2、为驱动表 type添加索引

CREATE INDEX idx_type_card ON type (card);
EXPLAIN
SELECT SQL_NO_CACHE *
FROM type
         INNER JOIN book ON type.card = book.card;

1.3.3、删除被驱动表 type索引

DROP INDEX idx_type_card ON type;
EXPLAIN
SELECT SQL_NO_CACHE *
FROM type
         INNER JOIN book ON type.card = book.card;

1.3.4、当所以表中都为内连接字段添加索引时,而一个表数据量大时

INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card)
VALUES (FLOOR(1 + (RAND() * 20)));
EXPLAIN
SELECT SQL_NO_CACHE *
FROM type
         INNER JOIN book ON type.card = book.card;

二、JOIN 的底层原理

JOIN 方式连接多个表,本质就是各个表之间数据的循环匹配。MySQL5.5 版本之前,MySQL只支持一种表间关联方式,就是嵌套循环(Nested Loop Join)。如果关联表的数据量很大,则JOIN关联的执行时间会非常长。在 MySQL5.5 以后的版本中,MySQL通过引入 BNLJ算法来优化嵌套执行

2.1、驱动表和被驱动表概念

2.1.1、对于内连接

SELECT * FROM A JOIN B . . .

2.1.1、对于外连接

SELECT * FROM A LEFT JOIN B . . .
或
SELECT * FROM B REGHT JOIN A . . .
CREATE TABLE t_a
(
    f1 INT,
    f2 INT,
    INDEX (f1)
);

CREATE TABLE t_b
(
    f1 INT,
    f2 INT
);
INSERT INTO t_a (f1, f2)
VALUES (1, 1),
       (2, 2),
       (3, 3),
       (4, 4),
       (5, 5),
       (6, 6);
INSERT INTO t_b (f1, f2)
VALUES
       (3, 3),
       (4, 4),
       (5, 5),
       (6, 6),
       (7, 7),
       (8, 8);
EXPLAIN
SELECT *
FROM t_a a
         LEFT JOIN t_b b ON a.f1 = b.f1
WHERE a.f2 = b.f2;
SHOW WARNINGS\G;

LEFT JOIN 优化成了 JOIN

SELECT `atguigudb2`.`a`.`f1` AS `f1`,
       `atguigudb2`.`a`.`f2` AS `f2`,
       `atguigudb2`.`b`.`f1` AS `f1`,
       `atguigudb2`.`b`.`f2` AS `f2`
FROM `atguigudb2`.`t_a` `a`
         JOIN `atguigudb2`.`t_b` `b`
WHERE ((`atguigudb2`.`a`.`f1` = `atguigudb2`.`b`.`f1`) AND
       (`atguigudb2`.`a`.`f2` = `atguigudb2`.`b`.`f2`));
image.png
EXPLAIN
SELECT *
FROM t_a a
         LEFT JOIN t_b b ON (a.f1 = b.f1) AND (a.f2 = b.f2);
SHOW WARNINGS\G;
image.png
SELECT `atguigudb2`.`a`.`f1` AS `f1`,
       `atguigudb2`.`a`.`f2` AS `f2`,
       `atguigudb2`.`b`.`f1` AS `f1`,
       `atguigudb2`.`b`.`f2` AS `f2`
FROM `atguigudb2`.`t_a` `a`
         LEFT JOIN `atguigudb2`.`t_b` `b` ON (((`atguigudb2`.`b`.`f2` = `atguigudb2`.`a`.`f2`) AND
                                               (`atguigudb2`.`b`.`f1` = `atguigudb2`.`a`.`f1`)))
WHERE TRUE
EXPLAIN
SELECT *
FROM t_a a
         JOIN t_b b ON a.f1 = b.f1
WHERE a.f2 = b.f2;
SHOW WARNINGS\G;
image.png
SELECT `atguigudb2`.`a`.`f1` AS `f1`,
       `atguigudb2`.`a`.`f2` AS `f2`,
       `atguigudb2`.`b`.`f1` AS `f1`,
       `atguigudb2`.`b`.`f2` AS `f2`
FROM `atguigudb2`.`t_a` `a`
         JOIN `atguigudb2`.`t_b` `b`
WHERE ((`atguigudb2`.`a`.`f1` = `atguigudb2`.`b`.`f1`) AND (`atguigudb2`.`a`.`f2` = `atguigudb2`.`b`.`f2`));

2.2、Simple Nested-Loop Join(简单嵌套循环连接)

算法相当简单,从表A中取出一条数据1,遍历表B,将匹配到的数据放到Result。以此类推,驱动表A中的每一条记录与被驱动表B的记录进行判断

计算逻辑图.png

2.3、Index Nested-Loop Join(索引嵌套循环链接)

Index Nested-Loop Join其优化的思路主要是为了 减少内层表数据的匹配次数,所以要求被驱动表上必须有索引才行。通过外层表匹配条件直接与内层表索引进行匹配,避免和内层表的每条记录去进行比较,这样极大的减少了对内层表的匹配次数。

2.4、Block Nested-Loop Join(块嵌套循环连接)

如果存在索引,那么会使用Index的方式进行Join,如果Join的列没有索引,被驱动表要扫描的次数太多了。每次访问被驱动表,其表中的记录都会被加载到内存中,然后再从驱动表中取一条与其匹配,匹配结束后清除内存,然后再从驱动表中加载一条记录,然后把被驱动表的记录再加载到内存匹配,这样周而复始,大大增加了IO的次数。为了减少被驱动表IO次数,就出现了Block Nested-Loop Join
不再是逐条获取驱动表的数据,而是一块一块的获取,引入了Join Buffer 缓冲区,将驱动表JOIN相关的部分数据列(大小受 Join Buffer 的限制)缓存到Join Buffer中,然后全表扫描被驱动表被驱动表的每一条记录一次性和Join Buffer中的所有 驱动表记录进行匹配(内存中操作),将简单嵌套循环中的多次比较合并成一次,降低了被驱动表的访问频率

缓存的不只是关联表的列(f1),而是缓存的SELECT 后面需要输出的列
在一个有 N 个JOIN关联的SQL中会分配N-1Join buffer。所以查询的时候尽量减少不必要的查询字段,可以让Join buffer中存放更多记录

通过SHOW VARIABLES LIKE '%optimizer_switch%';查看block_nested_loop状态,默认开启

SHOW VARIABLES LIKE '%optimizer_switch%';
index_merge=ON,index_merge_union=ON,index_merge_sort_union=ON,index_merge_intersection=ON,engine_condition_pushdown=ON,
index_condition_pushdown=ON,mrr=ON,mrr_cost_based=ON,block_nested_loop=ON,batched_key_access=off,materialization=ON,
semijoin=ON,loosescan=ON,firstmatch=ON,duplicateweedout=ON,subquery_materialization_cost_based=ON,use_index_extensions=ON,
condition_fanout_filter=ON,derived_merge=ON,use_invisible_indexes=off,skip_scan=ON,hash_join=ON,subquery_to_derived=off,
prefer_ordering_index=ON,hypergraph_optimizer=off,derived_condition_pushdown=ON

驱动表能不能一次加载完,要看join_buffer能不能存储所有的数据,默认情况下join_buffer_size=262144(256k)join_buffer_size的最大值在32位系统可以申请 4G,而在64位操作系统下可以申请大于4Gjoin_buffer空间(64位 Windows 除外,其大值会被截断为 4G 并发出警告)

SHOW VARIABLES LIKE '%join_buffer%';

2.5、Hash Join

从 MySQL 的8.0.20 版本开始将废弃BNLJ,因为从MySQL8.0.18版本开始就加入了Hash Join默认都会使用Hash Join

2.6、小结

上一篇下一篇

猜你喜欢

热点阅读