join over mysql.8.0 2022-04-15

2022-04-15 本文已影响0人 9_SooHyun

ON clause and WHERE clause in join

从语义上看，

WHERE clause: After joining. Records will be filtered after join has taken place.
ON clause - Before joining. Records (from right table) will be filtered before joining. This may end up as null in the result.

但从实际上看，
WHERE clause有可能会被mysql优化器放置在join之前执行，目的是在join之前减小表的体积。join的驱动表越小越好，小表驱动大表，可以极大提升join效率。

因此，实际情况是ON clause and WHERE clause：

Does not matter for inner joins
Matters for outer joins
a. WHERE clause: After joining. Records will be filtered after join has taken place.
b. ON clause - Before joining. Records (from right table) will be filtered before joining. This may end up as null in the result (since OUTER join).

小结：只在OUTER join时，ON和WHERE有作用时间和作用域的区别，影响执行计划和结果；其余类型的join两者等价

参考：https://stackoverflow.com/questions/354070/sql-join-where-clause-vs-on-clause

Nested Loop Join & Hash Join

Nested Loop Join

Nested Loop Join can be further categorized as Naive Nested Loop Join, Indexed Nested Loop Join and Temporary Index Nested Loop Join

NESTED LOOP JOIN

Naive Nested Loop Join
The Nested Loop Join searches for a row in the entire inner side of the table / index (except semi-join); this is called a Naive Nested Loop Join.
A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined.

Assume that a join between three tables t1, t2, and t3 is to be executed using the following join types:

Table	Join Type	note
t1	range	最外层表，范围查找。mysql优化器往往会拿where条件先过滤部分数据得到小表，小表驱动大表
t2	ref	普通索引的等值查找
t3	ALL	最内层表，全表扫描

If a simple NLJ algorithm is used, the join is processed like this:

for each row in t1 matching range {
  for each row in t2 matching reference key {
    for each row in t3 {
      if row satisfies join conditions, send to client
    }
  }
}

Indexed Nested Loop Join
The Nested Loop Join searches for a row in the inner side of the index and seeks the index’s B-tree for the searched value(s) and then stops looking further; it is called an Index Nested Loop Join.
Temporary Index Nested Loop Join
A Temporary index Nested Loop Join is exactly like the Index Nested Loop Join. The only difference is that the temporary index is created at the run time of query and destroyed upon completion of the query; it is called a temporary index nested loop join.

如果内循环是全表扫描，时间复杂度就是O(m x n)
如果内循环是索引扫描，时间复杂度就是O(m x ㏒n)
比如，存在2张表，一个10条记录，一个1000万条记录
以小表为驱动表，则代价为：10(通过索引在大表查询一条记录的代价)
如果1000万的大表没有索引的时候，那么进行全表扫描COST很大
因此，在多表连接时，注意被驱动表的连接字段是否需要创建索引，或者连接字段与该表的其他约束条件字段上是否需要创建复合索引。最好的情况就是被驱动表的连接字段上建有索引*

Hash Join

Beginning with MySQL 8.0.18, MySQL employs a hash join for any query for which [1.each join contains an equi-join condition], and in which [2.there are no indexes that can be applied to any join conditions]
连接条件包含等连接条件，并且没有索引可以应用于任何连接条件时，使用hash join
(注意，这不是指连接条件的字段上没建索引，而是【无法应用索引】。例如小表作为驱动表，也在连接字段上建了索引，但是大表的连接字段上没有建索引，那么这个连接条件就无法应用索引——驱动表总是全表扫描，不会使用驱动表的索引；这时如果被驱动表连接字段没有索引，被驱动表只能全表扫描，也就是“没有索引可以应用于任何连接条件”)

hash join 基本算法

The classic hash join algorithm for an inner join of two relations proceeds as follows:

First, prepare a in-memory hash table using the contents of one relation, ideally whichever one is smaller(measured in bytes, not number of rows) after applying local predicates. This relation is called the build side of the join. The hash table entries are mappings from the value of the (composite) join attribute to the remaining attributes of that row (whichever ones are needed).
Once the hash table is built, scan the other relation (the probe side). For each row of the probe relation, find the relevant rows from the build relation by looking in the hash table.

The first phase is usually called the "build" phase, while the second is called the "probe" phase. Similarly, the join relation on which the hash table is built is called the "build" input, whereas the other input is called the "probe" input.

根据join的条件，对hash后能够得到更小hashtable的一个表进行hash，entries are mappings from the value of the (composite) join attribute to the remaining attributes of that row (whichever ones are needed)。然后遍历另一个表，拿该表的join条件去查hash table，如果match，则两个表的对应行就完成了join。从理论上看是很简单的算法

hash table大小超出可用内存

The amount of memory available is controlled by the system variable join_buffer_size, and can be adjusted at runtime.

If the build input is bigger than the available memory, it chunks some files to disk

也就是说，完整的hashtable，可能一部分在内存中，一部分在磁盘中。在probe的时候就需要分情况处理了：
During the probe phase,

the server probes for matching rows in the hash table, just like when everything fits in memory.
But in addition, a row may also match one of the rows from the build input that is written to disk. Thus, each row from the probe input is also written to a set of chunk files. Which chunk file a row is written to is determined using the same hash function and formula that is used when the build input is written to disk. This way, we know for sure that matching rows between the two inputs will be located in the same pair of chunk files.
当发现当前的probe input row匹配的build input row被写入了磁盘的chunk file，mysql的策略是先把这个probe input row也写入磁盘的某个chunk file（暂且称作probe chunk file）。probe chunk file与build chunk file是可以建立关联关系的，因为决定哪一行被写入哪个chunk file的算法对build和probe都是一样的
例如，如果current probe input row匹配的build input row位于build chunk file no.4，那么current probe input row会被写入probe chunk file no.4，都是各自的第4块chunk file
probe phase while build input is written to disk
当可以在内存中匹配的行都匹配完成后，每个build-probe pair chunk为一组，load build input chunk into in-memory hashtable, reset the hash table, and continue scanning the probe input:
load build input chunk into in-memory hashtable and retrieve related probe input chunk

归根到底，所有的匹配都是在内存中完成的，磁盘只是作为一个暂存的中转站，chunk最终要回到内存

more info see: https://dev.mysql.com/blog-archive/hash-join-in-mysql-8/