Innodb undo之 undo结构简析
水平有限,如果有误请指出
参考:
- 阿里内核月报
- 姜老师的MySQL内核:innodb存储引擎
做一个简单的记录,自己备用
一、大体结构
rollback segments(128)
undo segments(1024)
undo log (header insert/modify 分开的) <-> undo page
undo record
undo record
作为undo segments的第一个undo page可以存放多个事物的undo log,因为如果这个块的undo 记录没有填满3/4则会进入 rollback segment的cache list,那么下次可以继续使用,但是如果第一个块不足以装下事物的undo 记录,那么很显然需要分配新的undo page,这种情况下一个undo page就只能包含一个事物的undo记录了。
事物每次需要分配rollback segments然后分配undo segments然后初始化好undo log header,insert和update/delete需要分配不同的undo segments,一个undo segments往往对应了一个undo log,undo log可以包含多个undo record(因为从debug来看undo log header的初始化只做了一次),对于操作的每行都会留下一个undo record作为mvcc构建历史版本的基础。
undo生成的基本单位是undo record,每行记录都会包含一个undo record,而rollback ptr指向的是undo record的偏移量,对于每行的记录都会去判断其可见性,如果需要构建前版本则通过本指针进行构建包含:
- 第1位是否是insert 第2到8位是undo segment id 第9到40位为page no 第41位到56位为 offset
每一个undo log包含一个trx_undo_t结构体
每一个rollback segments包含一个trx_rseg_t结构体
二、物理结构
- undo page header 每一个undo page都包含
/** Transaction undo log page header offsets */
/* @{ */
#define TRX_UNDO_PAGE_TYPE 0 /*!< TRX_UNDO_INSERT or
TRX_UNDO_UPDATE */
#define TRX_UNDO_PAGE_START 2 /*!< Byte offset where the undo log
records for the LATEST transaction
start on this page (remember that
in an update undo log, the first page
can contain several undo logs) */
#define TRX_UNDO_PAGE_FREE 4 /*!< On each page of the undo log this
field contains the byte offset of the
first free byte on the page */
#define TRX_UNDO_PAGE_NODE 6 /*!< The file list node in the chain
of undo log pages */
- undo semgent header 第一个page 才会用 undo segment header信息
#define TRX_UNDO_STATE 0 /*!< TRX_UNDO_ACTIVE, ... */
#ifndef UNIV_INNOCHECKSUM
#define TRX_UNDO_LAST_LOG 2 /*!< Offset of the last undo log header
on the segment header page, 0 if
none */
#define TRX_UNDO_FSEG_HEADER 4 /*!< Header for the file segment which
the undo log segment occupies */
#define TRX_UNDO_PAGE_LIST (4 + FSEG_HEADER_SIZE)
/*!< Base node for the list of pages in
the undo log segment; defined only on
the undo log segment's first page */
- 每一个undo log
- undo log header
- undo log record 相应的undo实际内容
- undo log record 相应的undo实际内容
undo log header 包含
#define TRX_UNDO_TRX_ID 0 /*!< Transaction id */
#define TRX_UNDO_TRX_NO 8 /*!< Transaction number of the
transaction; defined only if the log
is in a history list */
#define TRX_UNDO_DEL_MARKS 16 /*!< Defined only in an update undo
log: TRUE if the transaction may have
done delete markings of records, and
thus purge is necessary */
#define TRX_UNDO_LOG_START 18 /*!< Offset of the first undo log record
of this log on the header page; purge
may remove undo log record from the
log start, and therefore this is not
necessarily the same as this log
header end offset */
#define TRX_UNDO_XID_EXISTS 20 /*!< TRUE if undo log header includes
X/Open XA transaction identification
XID */
#define TRX_UNDO_DICT_TRANS 21 /*!< TRUE if the transaction is a table
create, index create, or drop
transaction: in recovery
the transaction cannot be rolled back
in the usual way: a 'rollback' rather
means dropping the created or dropped
table, if it still exists */
#define TRX_UNDO_TABLE_ID 22 /*!< Id of the table if the preceding
field is TRUE */
#define TRX_UNDO_NEXT_LOG 30 /*!< Offset of the next undo log header
on this page, 0 if none */
#define TRX_UNDO_PREV_LOG 32 /*!< Offset of the previous undo log
header on this page, 0 if none */
#define TRX_UNDO_HISTORY_NODE 34 /*!< If the log is put to the history
list, the file list node is here */
/* Note: the writing of the undo log old header is coded by a log record
MLOG_UNDO_HDR_CREATE or MLOG_UNDO_HDR_REUSE. The appending of an XID to the
header is logged separately. In this sense, the XID is not really a member
of the undo log header. TODO: do not append the XID to the log header if XA
is not needed by the user. The XID wastes about 150 bytes of space in every
undo log. In the history list we may have millions of undo logs, which means
quite a large overhead. */
/** X/Open XA Transaction Identification (XID) */
/* @{ */
/** xid_t::formatID */
#define TRX_UNDO_XA_FORMAT (TRX_UNDO_LOG_OLD_HDR_SIZE)
/** xid_t::gtrid_length */
#define TRX_UNDO_XA_TRID_LEN (TRX_UNDO_XA_FORMAT + 4)
/** xid_t::bqual_length */
#define TRX_UNDO_XA_BQUAL_LEN (TRX_UNDO_XA_TRID_LEN + 4)
/** Distributed transaction identifier data */
#define TRX_UNDO_XA_XID (TRX_UNDO_XA_BQUAL_LEN + 4)
三、分配步骤和写入
- 第一步为 分配rollback segments
#0 get_next_redo_rseg (max_undo_logs=128, n_tablespaces=4) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/trx/trx0trx.cc:1138
#1 0x0000000001c0bce8 in trx_assign_rseg_low (max_undo_logs=128, n_tablespaces=4, rseg_type=TRX_RSEG_TYPE_REDO)
at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/trx/trx0trx.cc:1314
#2 0x0000000001c1097d in trx_set_rw_mode (trx=0x7fffd7804080) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/trx/trx0trx.cc:3352
#3 0x0000000001a64013 in lock_table (flags=0, table=0x7ffeac012ae0, mode=LOCK_IX, thr=0x7ffe7c92ef48)
at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/lock/lock0lock.cc:4139
#4 0x0000000001b7950e in row_search_mvcc (buf=0x7ffe7c92e350 "\377", mode=PAGE_CUR_GE, prebuilt=0x7ffe7c92e7d0, match_mode=1, direction=0)
at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/row/row0sel.cc:5100
#5 0x00000000019d5443 in ha_innobase::index_read (this=0x7ffe7c92de10, buf=0x7ffe7c92e350 "\377", key_ptr=0x7ffe7cd57590 "\004", key_len=4,
find_flag=HA_READ_KEY_EXACT) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/handler/ha_innodb.cc:9536
#6 0x0000000000f9345a in handler::index_read_map (this=0x7ffe7c92de10, buf=0x7ffe7c92e350 "\377", key=0x7ffe7cd57590 "\004", keypart_map=1,
find_flag=HA_READ_KEY_EXACT) at /root/mysqlc/percona-server-locks-detail-5.7.22/sql/handler.h:2942
#7 0x0000000000f83e44 in handler::ha_index_read_map (this=0x7ffe7c92de10, buf=0x7ffe7c92e350 "\377", key=0x7ffe7cd57590 "\004", keypart_map=1,
find_flag=HA_READ_KEY_EXACT) at /root/mysqlc/percona-server-locks-detail-5.7.22/sql/handler.cc:3248
第二步 对于主键每行更改操作都会调用trx_undo_report_row_operation 他会分配undo segments 并且会负责写入undo record
#0 trx_undo_report_row_operation (flags=0, op_type=2, thr=0x7ffe7c932828, index=0x7ffea4016590, clust_entry=0x7ffe7c932cc0, update=0x0, cmpl_info=0,
rec=0x7fffb580d369 "", offsets=0x7fffec0f3e00, roll_ptr=0x7fffec0f3688) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/trx/trx0rec.cc:1866
#1 0x0000000001c5795b in btr_cur_del_mark_set_clust_rec (flags=0, block=0x7fffb4ccaae0, rec=0x7fffb580d369 "", index=0x7ffea4016590, offsets=0x7fffec0f3e00,
thr=0x7ffe7c932828, entry=0x7ffe7c932cc0, mtr=0x7fffec0f38f0) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/btr/btr0cur.cc:4894
#2 0x0000000001b9f218 in row_upd_del_mark_clust_rec (flags=0, node=0x7ffe7c932550, index=0x7ffea4016590, offsets=0x7fffec0f3e00, thr=0x7ffe7c932828, referenced=0,
mtr=0x7fffec0f38f0) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:2778
#3 0x0000000001b9f765 in row_upd_clust_step (node=0x7ffe7c932550, thr=0x7ffe7c932828)
at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:2923
#4 0x0000000001b9fc74 in row_upd (node=0x7ffe7c932550, thr=0x7ffe7c932828) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3042
#5 0x0000000001ba0155 in row_upd_step (thr=0x7ffe7c932828) at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/row/row0upd.cc:3188
#6 0x0000000001b3d3a0 in row_update_for_mysql_using_upd_graph (mysql_rec=0x7ffe7c9318d0 "\375\001", prebuilt=0x7ffe7c931d50)
at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/row/row0mysql.cc:3040
#7 0x0000000001b3d6a1 in row_update_for_mysql (mysql_rec=0x7ffe7c9318d0 "\375\001", prebuilt=0x7ffe7c931d50)
at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/row/row0mysql.cc:3131
#8 0x00000000019d47c3 in ha_innobase::delete_row (this=0x7ffe7c931390, record=0x7ffe7c9318d0 "\375\001")
at /root/mysqlc/percona-server-locks-detail-5.7.22/storage/innobase/handler/ha_innodb.cc:9141
大概流程
switch (op_type)
{
case TRX_UNDO_INSERT_OP:
undo = undo_ptr->insert_undo; //如果是 insert 则使用insert_undo 类型为trx_undo_t 指针
if (undo == NULL) { //如果已经分配了就不用分配了
err = trx_undo_assign_undo( //分配undo segment 同时初始化 undo log header
trx, undo_ptr, TRX_UNDO_INSERT);
undo = undo_ptr->insert_undo;
...
}
break;
default:
ut_ad(op_type == TRX_UNDO_MODIFY_OP); //断言
undo = undo_ptr->update_undo;
if (undo == NULL) {
err = trx_undo_assign_undo(
trx, undo_ptr, TRX_UNDO_UPDATE); //分配undo segment 同时初始化 undo log header
undo = undo_ptr->update_undo;
...
}
...
case TRX_UNDO_INSERT_OP://注意是每行都会操作
offset = trx_undo_page_report_insert( //写入insert undo log record
undo_page, trx, index, clust_entry, &mtr);
break;
default:
ut_ad(op_type == TRX_UNDO_MODIFY_OP); //写入delete update undo log record
offset = trx_undo_page_report_modify(
undo_page, trx, index, rec, offsets, update,
cmpl_info, clust_entry, &mtr);
}
...
*roll_ptr = trx_undo_build_roll_ptr( //构建rollback ptr 主键中每行都有这个 用于MVCC构建回滚版本
op_type == TRX_UNDO_INSERT_OP,
undo_ptr->rseg->id, page_no, offset);
四、分解undo log record
我将undo log record的写入到了错误日志,下面进行简单的分解。
表结构如下:
mysql> show create table t1;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
`id1` int(11) NOT NULL,
`id2` int(11) DEFAULT NULL,
PRIMARY KEY (`id1`),
KEY `id2` (`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
- insert 的undo记录,具体构造在trx_undo_page_report_insert中
语句
mysql> insert into t1 values(28,28);
Query OK, 1 row affected (0.00 sec)
输出如下:
trx_undo_assign_undo:assign undo space:
RSEG SLOT:34,RSEG SPACE ID:2 PAGE NO:3
UNDO SLOT:0,UNDO SPACE ID:2 UNDO LOG HEADER PAGE NO:27,UNDO LOG HEADER OFFSET:86,UNDO LOG LAST PAGE:27
trx_undo_page_report_insert:undo log record
TABLE_NAME:test/t1 TRX_ID:12591,UODO RECORD LEN:10
len 10; hex 011e0b0032048000001c;
011e0b0032048000001c就是undo record的实际记录解析如下:
011c page内部本undo record结束的位置
0b 类型为 #define TRX_UNDO_INSERT_REC 11(0X0b)
00 undo no,提交才会有
32 table_id 可以查询 INNODB_SYS_TABLES 对照
04 字段长度4个字节
8000001c 我插入的记录主键 28(0X1c)
- update 的undo记录,具体构造在trx_undo_page_report_modify中
语句:
mysql> update t1 set id2=1000 where id1=14;
Query OK, 1 row affected (5 min 40.91 sec)
Rows matched: 1 Changed: 1 Warnings: 0
输出如下:
trx_undo_assign_undo:assign undo space:
RSEG SLOT:41,RSEG SPACE ID:1 PAGE NO:5
UNDO SLOT:1,UNDO SPACE ID:1 UNDO LOG HEADER PAGE NO:37,UNDO LOG HEADER OFFSET:1389,UNDO LOG LAST PAGE:37
trx_undo_page_report_modify:undo log record
TABLE_NAME:test/t1 TRX_ID:12604,UODO RECORD LEN:47
len 47; hex 06560c0032000000003136e0260000002c052e048000000e010304800003e7000e00048000000e0304800003e70627;
06560c0032000000003136e0260000002c052e048000000e010304800003e7000e00048000000e030480
就是undo record的记录
大体解析如下:
0656 :page内部本undo record结束的位置
0c:类型为 #define TRX_UNDO_UPD_EXIST_REC 12(0X0c)
00: undo no,提交才会有
32: table_id 可以查询 INNODB_SYS_TABLES 对照
00:
0000003136e0:事物ID
260000002c052e:undo回滚指针
04:主键长度
8000000e:主键值
01
03:位置
04:被修改值的长度
800003e7:值为999(0x3e7)
000e:接下来字符的长度,记录原始值?
00:位置
04:长度
8000000e:主键值
03:位置
04:长度
800003e7:值为999(0x3e7)
0627:page内部本undo record开始的位置,0X0656-0X0627就是长度
作者微信:gp_22389860