记一次MySQL死锁排查
一. 环境
1. 表结构
涉及到以下两张表:
- BIND_RECORD
CREATE TABLE `BIND_RECORD` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`bind_uid` bigint(20) NOT NULL COMMENT '被绑uid',
`bind_by_uid` bigint(20) NOT NULL COMMENT '渠道uid',
`bind_type` int(2) NOT NULL COMMENT '绑定规则类型,1-长期,2-连续订阅,3-短期',
`bind_rule_ref` bigint(20) NOT NULL COMMENT '绑定规则id',
`start_time` datetime NOT NULL COMMENT '有效起始时间',
`end_time` datetime DEFAULT NULL COMMENT '有效终止时间',
`extra` varchar(1024) DEFAULT NULL COMMENT '额外信息,json字段',
`created_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`last_updated_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '修改时间',
PRIMARY KEY (`id`),
UNIQUE KEY `uq_bind_uid_type` (`bind_uid`,`bind_type`),
KEY `idx_bind_uid` (`bind_uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='绑定记录表';
- BIND_RECORD_HISTORY
CREATE TABLE `BIND_RECORD_HISTORY` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`bind_uid` bigint(20) NOT NULL COMMENT '渠道uid',
`bind_by_uid` bigint(20) NOT NULL COMMENT '被绑定方uid',
`bind_type` int(2) NOT NULL COMMENT '绑定规则类型,1-长期,2-连续订阅,3-短期',
`bind_rule_ref` bigint(20) NOT NULL COMMENT '绑定规则id',
`start_time` datetime NOT NULL COMMENT '有效起始时间',
`end_time` datetime DEFAULT NULL COMMENT '有效终止时间',
`created_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`last_updated_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '修改时间',
`extra` varchar(1024) DEFAULT NULL COMMENT '额外信息,json字段',
PRIMARY KEY (`id`),
KEY `idx_bind_uid` (`bind_uid`)
) ENGINE=InnoDB AUTO_INCREMENT=245 DEFAULT CHARSET=utf8mb4 COMMENT='绑定记录历史表,删除的绑定规则会移到历史表中';
表结构如上,使用InnoDB
插入一条测试数据:
INSERT INTO `xmds_bind`.`BIND_RECORD`(`id`, `bind_uid`, `bind_by_uid`, `bind_type`, `bind_rule_ref`, `start_time`, `end_time`, `extra`, `created_time`, `last_updated_time`) VALUES (244, 57431, 205449, 1, 75, '2018-12-07 12:43:59', '2018-12-01 12:43:59', NULL, '2018-11-29 10:55:30', '2019-02-13 16:35:40');
2. 数据库环境
查询数据的事务隔离级别如下:
mysql> SELECT @@tx_isolation;
+-----------------+
| @@tx_isolation |
+-----------------+
| REPEATABLE-READ |
+-----------------+
1 row in set (0.00 sec)
REPEATABLE-READ:可重复读
二. 死锁分析
1. 死锁日志
在服务运行中在日志中出现了ERROR信息:
Deadlock found when trying to get lock; try restarting transaction
由信息可以明确的知道是死锁造成的,使用下面的命令查看MySQL的日志。
show engine innodb status
找到LATEST DETECTED DEADLOCK
死锁信息如下:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-12-07 12:44:00 7faa671c7700
*** (1) TRANSACTION:
TRANSACTION 478255595954, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 95407803, OS thread handle 0x7facb920e700, query id 2341515722 172.31.42.155 naliworld updating
UPDATE BIND_RECORD SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE id = 244
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 129635 page no 3 n bits 152 index `PRIMARY` of table `xmds-bind`.`BIND_RECORD` trx id 478255595954 lock_mode X locks rec but not gap waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 8; hex 80000000000000f4; asc ;;
1: len 6; hex 006f5a414156; asc oZAAV;;
2: len 7; hex 1d0000048618fd; asc ;;
3: len 8; hex 800000000000e057; asc W;;
4: len 8; hex 8000000000032289; asc " ;;
5: len 4; hex 80000003; asc ;;
6: len 8; hex 800000000000004b; asc K;;
7: len 5; hex 99a18ecafb; asc ;;
8: len 5; hex 99a194cafb; asc ;;
9: SQL NULL;
10: len 5; hex 99a18ecaf6; asc ;;
11: len 4; hex 5c09fa8f; asc \ ;;
*** (2) TRANSACTION:
TRANSACTION 478255595950, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1184, 96 row lock(s)
MySQL thread id 95335471, OS thread handle 0x7faa671c7700, query id 2341515729 172.31.42.155 naliworld updating
DELETE FROM BIND_RECORD WHERE ( end_time is not null
and end_time < '2018-12-07 12:44:00' )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 129635 page no 3 n bits 152 index `PRIMARY` of table `xmds-bind`.`BIND_RECORD` trx id 478255595950 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 23 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 8; hex 8000000000000027; asc ';;
1: len 6; hex 006f53755353; asc oSuSS;;
2: len 7; hex 5900000542059a; asc Y B ;;
3: len 8; hex 8000000000032083; asc ;;
4: len 8; hex 80000000000318fe; asc ;;
5: len 4; hex 80000001; asc ;;
6: len 8; hex 8000000000000028; asc (;;
7: len 5; hex 99a15b1de1; asc [ ;;
8: SQL NULL;
9: len 2; hex 7b7d; asc {};;
10: len 5; hex 99a15b1de1; asc [ ;;
11: len 4; hex 5beaa899; asc [ ;;
......中间省略了和上面几乎一样的一堆信息,因为是批量删除
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 129635 page no 3 n bits 152 index `PRIMARY` of table `xmds-bind`.`BIND_RECORD` trx id 478255595950 lock_mode X waiting
Record lock, heap no 40 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 8; hex 80000000000000f4; asc ;;
1: len 6; hex 006f5a414156; asc oZAAV;;
2: len 7; hex 1d0000048618fd; asc ;;
3: len 8; hex 800000000000e057; asc W;;
4: len 8; hex 8000000000032289; asc " ;;
5: len 4; hex 80000003; asc ;;
6: len 8; hex 800000000000004b; asc K;;
7: len 5; hex 99a18ecafb; asc ;;
8: len 5; hex 99a194cafb; asc ;;
9: SQL NULL;
10: len 5; hex 99a18ecaf6; asc ;;
11: len 4; hex 5c09fa8f; asc \ ;;
*** WE ROLL BACK TRANSACTION (1)
分析一下上面的日志:
- 事务1(TRANSACTION),执行的语句为
UPDATE BIND_RECORD SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE id = 244
;等待X锁(WAITING FOR THIS LOCK TO BE GRANTED:lock_mode X locks rec but not gap waiting),其中 but not gap waiting表示非间隙锁。 - 事务2,执行的语句为
DELETE FROM BIND_RECORD WHERE ( end_time is not nulland end_time < '2018-12-07 12:44:00' )
;持有S锁(HOLDS THE LOCK(S)中所描述的lock mode S),等待X锁(WAITING FOR THIS LOCK TO BE GRANTED:lock_mode X waiting)
通过上面的描述,可以大概分析出死锁产生的原因
- 事务2当前持有了lock mode S(为什么会持有S锁还要看事务2中具体执行了哪些SQL)
- 事务1执行UPDATE操作,UPDATE操作需要X锁(lock_mode X locks rec but not gap,和 S锁冲突),当前锁不能获取,因为它和事务2中的持有S锁冲突,所以事务1的锁请求被加入一个锁请求队列,事务1被阻塞,等待事务2释放S锁
- 事务2执行DELETE操作,DELETE操作需要X锁(lock_mode X),所以现在情况就是,事务2等待X锁并持有S锁,但是由于事务1已经请求了一个X锁,所以不能获取到,而事务1又在等待事务2释放S锁,这样一来就出现了死锁。
2. 如何知道SQL的锁信息
上面提到了事务2首先持有了S锁,这个S锁是怎么持有的,怎么知道每个SQL对应的锁的信息呢。
模拟前先设置数据库事务隔离级别,关闭自动提交:
SET GLOBAL TRANSACTION ISOLATION LEVEL REPEATABLE READ; //隔离级别设置为REPEATABLE READ
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ; //保险起见GLOBAL和SESSION都设置一下
show variables like 'autocommit'; //查看是否开启了自动提交
set autocommit = 0;
在MySQL官网上,找到了如何打开InnoDB锁监控,具体参考https://dev.mysql.com/doc/refman/8.0/en/innodb-enabling-monitors.html:
SET GLOBAL innodb_status_output=ON;
SET GLOBAL innodb_status_output_locks=ON;
使用上面提到的语句来试一试:
在MySQL的终端依次输入下面的语句:
mysql>start transaction;
mysql>UPDATE BIND_RECORD SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE id = 244;
mysql>SHOW ENGINE INNODB STATUS;
mysql>commit;
在输入 SHOW ENGINE INNODB STATUS;
会打印出当前事务的锁信息如下:
------------
TRANSACTIONS
------------
Trx id counter 144145
Purge done for trx's n:o < 144142 undo n:o < 0 state: running but idle
History list length 3
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479443287616, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443286712, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443285808, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 144144, ACTIVE 4 sec
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 6, OS thread handle 123145535860736, query id 106 localhost root starting
SHOW ENGINE INNODB STATUS
TABLE LOCK table `xmds_bind`.`bind_record` trx id 144144 lock mode IX
RECORD LOCKS space id 616 page no 6 n bits 296 index PRIMARY of table `xmds_bind`.`bind_record` trx id 144144 lock_mode X locks rec but not gap
Record lock, heap no 225 PHYSICAL RECORD: n_fields 12; compact format; info bits 0
0: len 8; hex 80000000000000f4; asc ;;
1: len 6; hex 00000002330c; asc 3 ;;
2: len 7; hex 2a000001e70693; asc * ;;
3: len 8; hex 800000000000e057; asc W;;
4: len 8; hex 8000000000032289; asc " ;;
5: len 4; hex 80000003; asc ;;
6: len 8; hex 800000000000004b; asc K;;
7: len 5; hex 99a18ecafb; asc ;;
8: len 5; hex 99a194cafb; asc ;;
9: SQL NULL;
10: len 5; hex 99a17aadde; asc z ;;
11: len 4; hex 5c63d6dc; asc \c ;;
从输出中 RECORD LOCKS space id 616 page no 6 n bits 296 index PRIMARY of table
xmds_bind.
bind_recordtrx id 144144 lock_mode X locks rec but not gap
可以看出,需要一个X锁(lock_mode X locks rec but not gap)和我们上面的分析相符。
3. 本地模拟死锁
通过上面的方式,再分析代码,发现了事务2中有一条语句会持有S锁
insert into BIND_RECORD_HISTORY(id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra) select id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra from BIND_RECORD where end_time > '2018-12-09';
持有锁信息如下:
------------
TRANSACTIONS
------------
Trx id counter 144148
Purge done for trx's n:o < 144147 undo n:o < 0 state: running but idle
History list length 4
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 281479443287616, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443286712, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 281479443285808, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 144147, ACTIVE 6 sec
91 lock struct(s), heap size 24784, 19451 row lock(s), undo log entries 1
MySQL thread id 6, OS thread handle 123145535860736, query id 169 localhost root starting
SHOW ENGINE INNODB STATUS
TABLE LOCK table `xmds_bind`.`bind_record` trx id 144147 lock mode IS
RECORD LOCKS space id 616 page no 6 n bits 296 index PRIMARY of table `xmds_bind`.`bind_record` trx id 144147 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
下面来我们打开两个MySQL终端来模拟下,是否会出现我们预期中的死锁:
事务1 | 事务2 |
---|---|
start transaction; | |
insert into BIND_RECORD_HISTORY(id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra) select id, bind_uid, bind_by_uid, bind_type, bind_rule_ref,start_time, end_time, extra from BIND_RECORD where end_time < '2018-12-07 12:44:00'; | |
start transaction; | |
UPDATE BIND_RECORD SET id = id,bind_uid = 57431,bind_by_uid = 205449,bind_type = 3,bind_rule_ref = 75,start_time = '2018-12-07 12:43:59',end_time = '2018-12-10 12:43:59' WHERE id = 244; | |
DELETE FROM BIND_RECORD WHERE ( end_time is not null and end_time < '2018-12-07 12:44:00' ); |
按照上面的执行顺序,结果如下,报出了死锁错误:
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
LOCK
在知道了死锁原因之后要修改就比较简单了。
三. 官网上的一个例子
最后在官网上找到了类似的一个死锁例子,地址如下:
https://dev.mysql.com/doc/refman/8.0/en/innodb-deadlock-example.html