truncate/drop table性能问题脉络梳理

2024-09-03 本文已影响0人轻松的鱼

truncate/drop table 的性能问题是一个历史悠久，十分复杂的问题，本篇文章会从时间顺序上梳理这个问题的表现和官方的优化动作，希望能帮助大家解决一些疑惑。

首先我们回归到官方手册，看看都有哪些记载，首先是MySQL5.7的手册：

TRUNCATE TABLE for a table closes all handlers for the table that were opened with HANDLER OPEN.
On a system with a large InnoDB buffer pool and innodb_adaptive_hash_index enabled, TRUNCATE TABLE operations may cause a temporary drop in system performance due to an LRU scan that occurs when removing an InnoDB table's adaptive hash index entries. The problem was addressed for DROP TABLE in MySQL 5.5.23 (Bug #13704145, Bug #64284) but remains a known issue for TRUNCATE TABLE (Bug #68184).

说的是在具有大型 InnoDB 缓冲池并启用 innodb_adaptive_hash_index 的系统上，由于删除 InnoDB 表的自适应哈希索引条目时发生 LRU 扫描，TRUNCATE TABLE 操作可能会导致系统性能暂时下降。 MySQL 5.5.23 中的 DROP TABLE 问题已得到解决（bug#13704145，bug#64284），但对于 TRUNCATE TABLE 仍然是一个已知问题（ bug#68184）。

然后是MySQL8.0的手册：“之前版本 truncate 的性能问题，8.0 已经将 truncate 映射成了 drop table + create table 避免了这个问题”

In MySQL 5.7 and earlier, on a system with a large buffer pool and innodb_adaptive_hash_index enabled, a TRUNCATE TABLE operation could cause a temporary drop in system performance due to an LRU scan that occurred when removing the table's adaptive hash index entries (Bug #68184). The remapping of TRUNCATE TABLE to DROP TABLE and CREATE TABLE in MySQL 8.0 avoids the problematic LRU scan.

手册中写的“MySQL 5.5.23解决了 DROP TABLE 的性能问题，8.0 解决了 TRUNCATE TABLE 的性能问题”显然不是事实，在我们的体验中，无论是MySQL5.7还是8.0都依然存在 truncate/drop table 导致的MySQL hang甚至crash。

那么问题来了，为什么？为什么官方手册会这么写？为什么性能问题依然存在？ 让我们一起来追寻真相。

bug梳理

按时间顺序梳理相关bug如下。

1. bug#64284 & bug#51325

官方文档中正式提到的 bug 是：https://bugs.mysql.com/bug.php?id=64284
更详细的讨论应该看这个 bug：https://bugs.mysql.com/bug.php?id=51325

bug 指出 drop table 会做两次 LRU 扫描：一次是从 LRU list 中删除表的数据页，一次是删除表的 AHI 条目。

优化后变成：扫描一次 LRU list 来删除 AHI；扫描一次 flush list 来删除表的数据页；并且buffer pool mutex 会定期释放，不再是一直持有直到扫描完 LRU list（这样持有 mutex 的时间太长，会阻塞其他请求）。

其中 bug#51325 中有些讨论可以帮助我们理解这个bug。

大意是对于 60G 的缓冲池和其中分配的约 320 万页，删除一个空的 innodb 表会将缓冲池互斥锁锁定约 0.5 秒，因为它会扫描 LRU 两次：

一次在 buf_LRU_invalidate_tablespace
一次在 buf_LRU_invalidate_tablespace 调用的 buf_LRU_drop_page_hash_for_tablespace 中

5.5.23 版本前代码为：

fil_delete_tablespace
buf_LRU_invalidate_tablespace(
     ulint     id)     /*!< in: space id */
{
     ulint     i;()
     for (i = 0; i < srv_buf_pool_instances; i++) {
          buf_pool_t*     buf_pool;

          buf_pool = buf_pool_from_array(i);
          buf_LRU_drop_page_hash_for_tablespace(buf_pool, id);
          buf_LRU_invalidate_tablespace_buf_pool_instance(buf_pool, id);
     }
}

##buf_LRU_drop_page_hash_for_tablespace 会扫描一次LRU list，需要从adaptive hash中删除对要删除的表的page的引用；
##buf_LRU_invalidate_tablespace_buf_pool_instance 会扫描一次LRU list:如果是dirty block，需要从flush list remove掉，然后从page hash中删除，最后从LRU list中删除。
##这两个删除操作都是同步方式

这一条评论大意是：删除表时，对于表在 LRU list 中的 page，惰性失效比同步删除要更合理。如果要删除一个小的或未使用的表，则扫描缓冲池的成本不值得它释放的几个page；即使要删除一个大表，也可能不会从释放大量缓冲池中看到足够的好处（如果有的话）来证明服务中断是合理的，所以我仍然愿意等待这些 page 通过LRU算法正常的失效。

这个评论说出了要点：同步扫描 LRU list 删除 page 不是必须的，同步删除带来的是长时间的 mutex，会导致其他请求阻塞。

这条评论是关于优化方案：

采用了惰性删除的思路，通过扫描 flush list 来删除对应的 page（替换了扫描 LRU list，因为 flush list 通常比 LRU list 小的多）；
但是 LRU 的扫描只是减少了一次，并没有完全消除；
buffer pool mutex 会定期释放，不再是一直持有直到扫描完 LRU list（这样持有 mutex 的时间缩短，如果删除操作需要很长时间，其他线程可以同时工作）。

2. bug#61188

https://bugs.mysql.com/bug.php?id=61188
这个bug指出对于分区表，drop table 会删除多个分区，删除每个分区时都是扫描 LRU list 两次，放大了 bug#51325 的问题。

3. bug#68184

https://bugs.mysql.com/bug.php?id=68184
这个bug 说的是 truncate table 会扫描 LRU 来删除 AHI，导致性能下降；8.0 已修复，方法是将 truncate 映射成 drop table + create table。

注意：这个bug 的开头和结尾都说是因为扫描 LRU 删除 AHI 导致的，甚至官方文档中说的也是“由于删除 InnoDB 表的自适应哈希索引条目时发生 LRU 扫描，可能会导致系统性能暂时下降”。

这里存在疑问，因为从 bug#68284 和 bug#51325 我们得知其实主要还是因为扫描 LRU 删除 page，并且持有 buffer pool mutex 时间过长，5.5.23 修复后还是得扫描 LRU 删除 AHI。

其中有一个评论：truncate table 会重新为表分配一个 space id，因此这张表的 page 留在 LRU list 中是没有问题的（这些 page 的 space id 是旧的，不会被读取到），所以可以和 drop table 一样使用惰性删除进行优化。

备注：这个说法我怀疑是因果倒置了。要知道 MySQL5.7中 truncate table 是不会重新分配 space id 的，因此需要同步删除 LRU list 中的page。再来看 8.0 将 truncate 映射成 drop table + create table，这是为了解决同步删除 page 的问题，8.0 中 truncate table 会重新分配 space id 这件事只是一种修复后的结果。

[20 May 2013 14:57] Inaam Rana

I initially thought that during TRUNCATE we have to evict all pages from LRU because we reuse the same space_id. Looking at the code I see that we allocate a new space_id. If that is the case then probably it is OK to leave pages in the LRU list like we do during DROP table.

4. bug#91977

https://bugs.mysql.com/bug.php?id=91977
这个bug说的是 drop table 扫描 LRU 删除 AHI 导致信号量等待，造成长时间的阻塞。这个其实就是 5.5.23 修复方案中没能解决的问题：LRU 的扫描只是减少了一次，并没有完全消除。

疑问1：为什么必须要同步删除 AHI？不能和 page 一样惰性删除吗？

疑问2：5.5.23 修复方案中 buffer pool mutex 会定期释放（每扫描 1024个page就会释放buffer pool mutex，然后再重新获取），为什么还会造成阻塞？

另外这个 bug 的讨论中提到关闭 AHI 并不是一个好办法：

[2 Nov 2018 2:55] Jesper wisborg Krogh

Posted by developer: The workaround is to disable the InnoDB adaptive hash index when dropping the table. This will at least greatly reduce the time it takes for the DROP TABLE to complete (in my tests in 8.0.13 to less than 0.2 seconds). In fact the DROP TABLE has become so quick, I didn't manage to see if it still blocked.To temporarily disable the InnoDB adaptive hash index while dropping the table:SET GLOBAL innodb_adaptive_hash_index = OFF;DROP TABLE ...SET GLOBAL innodb_adaptive_hash_index = ON;Or if your workload doesn't benefit from it, disable it permanently. Read more at http://dev.mysql.com/doc/refman/en/innodb-adaptive-hash.html .

[2 Nov 2018 12:06] Ivan Groenewold

I don't think the "workaround" provided is a good idea. Disabling AHI on a live server without any testing is not to be done lightly. It could affect response times of existing queries that benefit from it and eventually bring the server down. I've seen this.

[23 Mar 2020 11:12] MySQL Verification Team

Disabling the AHI is not a solution in all cases. If the purge thread wakes up or the master thread checks if tables need eviction from cache, this will still lead of long semaphore waits on the data dictionary lock.

5. bug#98869

https://bugs.mysql.com/bug.php?id=98869
这个bug指出虽然 8.0 依旧修复了 truncate table 的问题，但是对于一些查询产生的磁盘临时表（innodb 表），在临时表被删除时，还是会有同样的问题。这个bug在8.0.23中得到修复。

小结

一切要从 bug#64284 & bug#51325 开始说起：

当时 drop table 的实现中，需要同步删除 LRU list 中表的 page 和 AHI 条目，需要扫描两次 LRU list，持有 buffer pool mutex 直到结束。这在 innodb buffer pool 很大时会很慢，并且由于 buffer pool mutex 会导致其他线程基本上处于堵塞状态，MySQL服务中断。

在 5.5.23 中进行了修复，用的惰性删除策略：

扫描 flush list（因为 flush list 通常比 LRU list 小的多）删除对应的 page，而 LRU list 上的 page 则依靠 LRU 算法慢慢失效；
但是 LRU 的扫描只是减少了一次，并没有完全消除；
buffer pool mutex 会定期释放，不再是一直持有直到扫描完 LRU list（这样持有 mutex 的时间缩短，如果删除操作需要很长时间，其他线程可以同时工作）。

drop 分区表，删除每个分区时都会扫描 LRU，问题更严重。

truncate table 因为实现问题，在MySQL5.7 中不会重新分配 table space id，为了防止读到已删除表的 page，因此需要同步删除 LRU list 中对应表的的 page。MySQL8.0 将 truncate table 映射成 drop table + create table，和 drop table 一样使用惰性删除。

惰性删除原理

参考文档：https://www.percona.com/blog/2012/06/22/drop-table-and-stalls-lazy-drop-table-in-percona-server-and-the-new-fixes-in-mysql/

MySQL 5.5.23 惰性删除优化原理（percona一开始实现的方式不一样，后面采用了官方的修复方法）：

Loop while all dirty pages belonging to the tablespace (being dropped) have been removed from the buffer pool
- Take the buffer pool mutex
- Take a mutex on the flush list of the buffer pool
- Scan the flush list and for each dirty page in the flush list
  - Remove the page from the buffer pool if the page belongs to the tablespace being dropped
  - If we have removed 1024 dirty pages from the buffer pool thus far we release the buffer pool mutex and the flush list mutex for some time
    - Exit the flush list mutex
    - Exit the buffer pool mutex
    - Try to force a context switch by advising the os to give up remainder of the thread’s time slice (this is going to let other threads do things on the buffer pool and prevents the buffer pool mutex from being kept for long)
  - Take the buffer pool mutex and the flush list mutex again
Release the flush list mutex
Release the buffer pool mutex

入口函数在 buf_LRU_flush_or_remove_pages，根据传入参数选择不同的策略：

BUF_REMOVE_ALL_NO_WRITE，从 flush list 和 LRU list 同步删除page，开销大；
BUF_REMOVE_FLUSH_NO_WRITE，从 flush list 同步删除 page，并且不需要刷盘，直接丢弃掉。LRU list 上的 page 通过 LRU 算法慢慢失效，因为 LRU list 的大小远远大于 flush list，删除 lru list 上的 page 成本很大的，这其实就是惰性删除了；
BUF_REMOVE_FLUSH_WRITE，从 flush list 上刷脏并删除，刷脏之后脏页变成干净页，LRU list 上的页就不需要动了。通常是非 drop/truncate 的 DDL 操作触发这个逻辑。

buf_LRU_flush_or_remove_pages(
/*==========================*/
        ulint                id,                /*!< in: space id */
        buf_remove_t        buf_remove,        /*!< in: remove or flush strategy */
        const trx_t*        trx)                /*!< to check if the operation must
                                        be interrupted */
{
        ulint                i;

        /* Before we attempt to drop pages one by one we first
        attempt to drop page hash index entries in batches to make
        it more efficient. The batching attempt is a best effort
        attempt and does not guarantee that all pages hash entries
        will be dropped. We get rid of remaining page hash entries
        one by one below. */
        for (i = 0; i < srv_buf_pool_instances; i++) {
                buf_pool_t*        buf_pool;

                buf_pool = buf_pool_from_array(i);

                switch (buf_remove) {
                case BUF_REMOVE_ALL_NO_WRITE:
                        buf_LRU_drop_page_hash_for_tablespace(buf_pool, id);
                        break;

                case BUF_REMOVE_FLUSH_NO_WRITE:
                        /* It is a DROP TABLE for a single table
                        tablespace. No AHI entries exist because
                        we already dealt with them when freeing up
                        extents. */
                case BUF_REMOVE_FLUSH_WRITE:
                        /* We allow read-only queries against the
                        table, there is no need to drop the AHI entries. */
                        break;
                }

                buf_LRU_remove_pages(buf_pool, id, buf_remove, trx);
        }
}

测试

#MySQL5.7 truncate table 时 buf_LRU_flush_or_remove_pages 函数传入的是 BUF_REMOVE_ALL_NO_WRITE：
(gdb) p buf_remove
$1 = BUF_REMOVE_ALL_NO_WRITE

#MySQL5.7 drop table 时传入的是 BUF_REMOVE_FLUSH_NO_WRITE：
(gdb) p buf_remove
$2 = BUF_REMOVE_FLUSH_NO_WRITE

#MySQL8.0 truncate/drop table 传入的都是 BUF_REMOVE_FLUSH_NO_WRITE：
(gdb) p buf_remove
$2 = BUF_REMOVE_FLUSH_NO_WRITE

#MySQL8.0 删除分区表时，删除每个分区都会调用 buf_LRU_flush_or_remove_pages

解疑

要解开这个疑问，必须重新认识 drop table/truncate table 需要的两个大操作：

删除 buffer pool 中表的索引页，以及索引页的 AHI
删除 buffer pool 中表的数据页
MySQL 5.5.23 引入的惰性删除实际上只对 b 操作做了优化，a 操作从 bug#51325 中没找到优化的描述。MySQL5 版本的关注重点：明白惰性删除优化就行。
“测试”章节中测的其实还是惰性删除，也就是 b 操作。b 操作很明确，不会再有 LRU list 的扫描。那 a 操作是否要扫描 LRU？在 MySQL5.7 中，a 操作不通过扫描 LRU 实现。

总结

1. 5.5.23 以前

当时 drop table 的实现中（truncate table 一样），需要同步删除 LRU list 中表的 page 和 AHI 条目，需要扫描两次 LRU list，持有 buffer pool mutex 直到结束。这在 innodb buffer pool 很大时会很慢，并且由于 buffer pool mutex 会导致其他线程基本上处于堵塞状态，MySQL服务中断。

2. 5.5.23

drop table 时删除 buffer pool 中表的数据页惰性删除策略：

扫描 flush list（因为 flush list 通常比 LRU list 小的多）删除对应的 page，而 LRU list 上的 page 则依靠 LRU 算法慢慢失效；
buffer pool mutex 会定期释放，不再是一直持有直到扫描完 flush list（这样持有 mutex 的时间缩短，如果删除操作需要很长时间，其他线程可以同时工作）。

但是 LRU 的扫描只是减少了一次，并没有完全消除，剩下的一次是为了删除 AHI；

truncate table 因为实现问题，不会重新分配 table space id，为了防止读到已删除表的 page，因此需要同步删除 LRU list 中对应表的的 page。MySQL8.0.0 将 truncate table 映射成 drop table + create table，和 drop table 一样使用惰性删除。

3. 5.7

drop table：

删除 buffer pool 中表的索引页，以及索引页的 AHI

a. 不扫描 LRU list，扫磁盘表空间可以得知索引页在 buffer pool 中的位置，如果索引页的记录在 AHI 中，继续到 AHI 中删除对应记录
删除 buffer pool 中表的数据页

a. 扫描 flush list，同步删除表的数据页

b. LRU list 上的 page 通过 LRU 算法慢慢失效

在 MySQL5.7 中，drop table 确实不需要扫描 LRU。如果表小，扫磁盘表空间很快，比扫LRU list 成本低很多；如果表很大，扫扫磁盘表空间也会很耗时，与直接扫LRU相比优势不大。

Truncate table：

删除 buffer pool 中表的索引页，以及索引页的 AHI

a. 不扫描 LRU list，扫磁盘表空间可以得知索引页在 buffer pool 中的位置，如果索引页的记录在 AHI 中，继续到 AHI 中删除对应记录
删除 buffer pool 中表的数据页

a. 扫描 flush list，同步删除表的数据页

b. 扫描 LRU list，同步删除表的数据页

4. MySQL8.0.23 以前

MySQL8.0.0 已经将 truncate 映射成 drop table + create table 了，与 MySQL5.7 的区别就一点：

删除 buffer pool 中表的索引页，以及索引页的 AHI：
1. 扫描 LRU list，如果索引页在 AHI 中，则删除对应的 AHI 记录

这个改动似乎回到了 5.5.22，对于小表来说，也要扫描整个 LRU list，成本比较高。

5. MySQL8.0.23

优化了删除 buffer pool 中表的数据页的实现，现在 flush list 也变成惰性删除了：从 flush_list 链表中删除 drop 表的数据页不再由用户线程同步执行，而是由后台 page cleaner 线程异步执行。

现在 drop/truncate table 最耗时的操作只剩一个：扫描 LRU list，删除 AHI。

需要注意的是 8.0.23 release note 里提到：删除buffer pool 中的 data page 和 AHI 不再是同步删除，而是被动删除（也就是异步删除、惰性删除）：

实际上却不是这么回事，AHI 还是要扫描 LRU list 同步删除，这在另一篇官方博客中做了说明：虽然 release note 写了，但 8.0.23 版本并未实现，将很快解决。也不知道是不是偷偷修复的，release note 和博客再未见其描述：

6. MySQL8.4

innodb_adaptive_hash_index 默认值变为 false，也就是说 AHI 默认是关闭的。文档中也提示需要我们根据业务的负载情况进行调整，最好根据实际业务压测开启、关闭AHI的表现：

建议

对于 drop/truncate table，如果你的生产环境中经常遇到执行期间发生性能下降，建议使用8.0.23以上的版本，并且关闭 AHI。