MySQL binlog初步解读
MySQL的binlog主要用于逻辑同步以及二阶段提交的安全性保证,而在平时的使用中我们DBA也会从中获取一些重要的信息,比如说主从同步延迟了我们可以通过binlog查看当前事务执行的内容,比如可以利用binlog+备份的方式进行误删回滚,开源工具中也有很多利用binlog做闪回、同步数据到缓存中的方案,异地多活的高可用框架完成后有点空闲就对binlog的组成深入了解了下(基于MySQL 5.7版本,binlog版本v4),mysql 5.0之后binlog都采用的v4版本,结构如下分为header和data两部分,header部分所有event都一样占用19 bytes:
+================================+
| event | timestamp 0 : 4 |#当前event写入时的时间
| header +-----------------------+
| | type_code 4 : 1 |#当前event的类型ID
| +-----------------------+
| | server_id 5 : 4 |
| +------------------------+
| | event_length 9 : 4 |#当前event总字节数
| +------------------------+
| | next_position 13 : 4 |#下一个event开始的position
| +-------------------------+
| | flags 17 : 2 |#标签
| +-------------------------+
| | extra_headers 19 : x-19 |
+===================================+
| event | fixed part x : y |
| data +------------------------+
| | variable part |
+=================================+
binlog文件是二进制文件,由一个一个的event组成,每个对数据变动的操作以及DDL语句都会产生一系列的event:
- FORMAT_DESCRIPTION_EVENT:binlog文件的第一个event,记录版本号等元数据信息;
- QUERY_EVENT:存储statement类的信息,基于statement的binlog格式记录sql语句,在row模式下记录事务begin标签;
- XID_EVENT:二阶段提交xid记录;
- TABLE_MAP_EVENT:row模式下记录表源数据,对读取行记录提供规则参考,后面会详细介绍;
- WRITE_ROWS_EVENT/DELETE_ROWS_EVENT/UPDATE_ROWS_EVENT:row模式下记录对应行数据变化的记录;
- GTID_LOG_EVENT:这个就是记录GTID事务号了,用于5.6版本之后基于GTID同步的方式;
- ROTATE_EVENT:连接下一个binlog文件;
Event类型还有很多,而与我们平时操作关联较多的也就上面这几个,有兴趣更详细了解的参考MySQL官方文档
FORMAT_DESCRIPTION_EVENT
简称为format_desc,我们直接在mysql执行reset master清空所有binlog重头生成一个新文件直接操作更容易理解。
(root@(none))>show binlog events in 'mysql-bin.000001';
+------------------+-----+----------------+-----------+-------------+---------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----+----------------+-----------+-------------+---------------------------------------+
| mysql-bin.000001 | 4 | Format_desc | 131066666 | 123 | Server ver: 5.7.22-log, Binlog ver: 4 |
| mysql-bin.000001 | 123 | Previous_gtids | 131066666 | 154 | |
+------------------+-----+----------------+-----------+-------------+---------------------------------------+
2 rows in set (0.00 sec)
可以看出format_desc开始的pos位置为4,这是因为每个binlog文件开头都会占用一个固定的4 bytes,编码为\xFE\x62\x69\x6E,现在来开始对他进行解析
00000000 fe 62 69 6e 5f dd a7 5d 0f 2a eb cf 07 77 00 00 |.bin_..].*...w..|
00000010 00 7b 00 00 00 01 00 04 00 35 2e 37 2e 32 32 2d |.{.......5.7.22-|
00000020 6c 6f 67 00 00 00 00 00 00 00 00 00 00 00 00 00 |log.............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 5f dd a7 5d 13 |..........._..].|
00000050 38 0d 00 08 00 12 00 04 04 04 04 12 00 00 5f 00 |8............._.|
00000060 04 1a 08 00 00 00 08 08 08 02 00 00 00 0a 0a 0a |................|
00000070 2a 2a 00 12 34 00 01 7a 1e cd 82 5f dd a7 5d 23 |**..4..z..._..]#|
00000080 2a eb cf 07 1f 00 00 00 9a 00 00 00 80 00 00 00 |*...............|
00000090 00 00 00 00 00 00 12 3f a7 e7 |.......?..|
0000009a
Format_desc event data部分的fixed part格式分别为:
- 2 bytes记录binlog version;
- 50 bytes记录MySQL server version;
- 4 bytes记录binlog文件创建时间;
- 1 bytes 值为19,是所有event的header长度;
剩余的所有字节分别记录mysql内部已定义event的fix par部分的长度。
按照这个规则可以找到binlog version记录值为0x0004也就是v4,后面50个字节通过解析为5.7.22-log,再4个字节创建时间的时间戳为1571282271(0x5da7dd5f),转换为时间是2019/10/17 11:17:51。
python解析的结果如下:
>>> import struct
>>> datafile = open('/work/mysql6666/var/mysql-bin.000001','rb')
>>> a = datafile.read(4)
>>> a = datafile.read(19)
>>> print struct.unpack('=IBIIIH',a)
(1571282271, 15, 131066666, 119, 123, 1)
>>> a = datafile.read(2)
>>> print struct.unpack('H',a)
(4,)
>>> a = datafile.read(50)
>>> print a.decode('utf8')
5.7.22-log
>>> a = datafile.read(4)
>>> print struct.unpack('I',a)
(1571282271,)
>>> a = datafile.read(1)
>>> print struct.unpack('B',a)
(19,)
QUERY_EVENT
-
Fixed part部分:
- Thread_id:4 bytes 产生数据的线程ID,可以可以用于DBA审计;
- Execute_time:4 bytes 该语句执行时间,单位秒;
- Databas_length:1 bytes 库名占字节长度;
- Error_code:2 bytes 错误代码,一般该值都为0,比如在master上执行inser...select...语句时myisam表出现主键冲突或者innodb表执行途中ctrl+c退出就会记录该值,在slave执行时会检查报错退出;
- Variable_block_length:2 bytes 记录data part部分variable status的长度;
-
Variable part:
- Variable_status:Variable_block_length;
- Database_name:Databas_length;
- Sql_statement:整个event剩余部分;
** TABLE_MAP_EVENT**
-
Fixed part:
- Table_id:6 bytes;
- Reserved:2 bytes 预留位置;
-
Variable part:
- Database_name_length:1 bytes;
- Database_name:database_name_length + 1个空字节;
- Table_name_length:1 bytes;
- Table_name:table_name_length + 1个空字节;
- Columns:1 bytes 记录表字段数,一般情况字段数是不会超过255所以占1 bytes;
- Columns_type_code:记录每个字段类型id,每个字段占用1个子节点(colums*1),该位记录的值主要对读取后面源数据提供标准,顺序与表结构字段顺序一致;
- Metadata_length:1 bytes 字段元数据占用字节长度,同样一般不会超过1 bytes;
- Metadata:可变长度为metadata_length,需根据Columns_type_code判断每个字段占用长度,数字类型(int,tinyint...)都不会占用空间,只有可变长度的才会占用,比如varchar、char、enum、binary都占用2bytes,text、blob、longtext只占用1bytes,这里的元数据对后面读取row记录提供格式规范;
- Variable_size:该部分记录字段是否允许为空,一位代表一个字段,占用字int((N+7/8))bytes,N为字段数;
- Crc: 4bytes 最后4字节校验码;
WRITE_ROWS_EVENT/DELETE_ROWS_EVENT/UPDATE_ROWS_EVENT
-
Fixed part:
- Table_id:6 bytes;
- Reserved:2 bytes 预留位置;
- Extra:2 bytes 具体干嘛的不清楚,官方文档也没介绍有这2 bytes内容;
-
Variable part:
- Columns: 1 bytes 字段数;
- Variable_size:可变长度int((n+7)/8),n是字段数,bit标识对应字段是否有值,1代表有,0代表没有,row模式都是有值的;
- Variable_size:跟上面一个一样,但是只有update_rows_event才有;
- Variable_size:跟上面两个一样计算长度,该值的bit位标识后面所跟的行数据每个字段是否为NULL,为NULL时bit位为0, 1代表有值,这个bit位和table_map_event的columns_type_code顺序对应;
- Value:数据内容;
- Crc:4 bytes 校验码;
XID_EVENT
Fixed part为空,只有variable part占有8 bytes的xid及结尾的4 bytes校验码。
** GTID_LOG_EVENT**
官网未找到有对该event的结构介绍,主要字节分布为1 bytes的空位,后面为16 bytes的uuid,再有8 bytes记录transactionid最后结束。
一个正常业务数据库产生的binlog要进行分析的话,基本也就上面这几个和我们关联最多,官方文档有对所有event的详细介绍,有兴趣的可以瞧瞧。
要对binlog进行解析还需要了解数据存储详细占用情况,同样的在官方文档有详细的介绍,这里拿我们常用的几个字段类型来做介绍及测试。
- Varchar: 占用字节数0-255bytes使用1byts记录长度,超过255bytes时使用2bytes记录长度 + 数据;
- Int、tinyint、bigint: 分别占用4 bytes、1 bytes、8 bytes;
- Text: 使用2个字节记录长度 + 数据;
- Timestamp(M): 4bytes记录日期时间 + 精确到的毫秒部分,占用长度取决于M;
- Datetime(M):5bytes 记录日期时间 + 精确到的毫秒部分,占用长度取决于M;
- 毫秒部分占用情况,FSP就是上面所说的M值:
FSP | Storage |
---|---|
0,0 | 0 bytes |
1,2 | 1 bytes |
3,4 | 2 bytes |
4,5 | 3 bytes |
- Datetime 5bytes记录分布:
Datetime | 含义 |
---|---|
1 bit | sign (1= non-negative, 0= negative) |
17 bits | year*13+month (year 0-9999, month 0-12) |
5 bits | day (0-31) |
5 bits | hour (0-23) |
6 bits | minute (0-59) |
6 bits | second (0-59) |
下面我创建了一个包含几个常用字段的表尝试进行对event解析:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT NULL,
`content` varchar(256) DEFAULT NULL,
`status` tinyint(4) DEFAULT NULL,
`bignum` bigint(20) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
(root@(none))>show binlog events in 'mysql-bin.000001';
+------------------+------+----------------+-----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+------+----------------+-----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mysql-bin.000001 | 4 | Format_desc | 131066666 | 123 | Server ver: 5.7.22-log, Binlog ver: 4 |
| mysql-bin.000001 | 123 | Previous_gtids | 131066666 | 154 | |
| mysql-bin.000001 | 154 | Gtid | 131066666 | 219 | SET @@SESSION.GTID_NEXT= 'b3b20ec1-f08c-11e9-b0c8-246e96c61f28:1' |
| mysql-bin.000001 | 219 | Query | 131066666 | 316 | create database mysia |
| mysql-bin.000001 | 316 | Gtid | 131066666 | 381 | SET @@SESSION.GTID_NEXT= 'b3b20ec1-f08c-11e9-b0c8-246e96c61f28:2' |
| mysql-bin.000001 | 381 | Query | 131066666 | 815 | use `mysia`; CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(10) DEFAULT NULL,
`content` varchar(256) DEFAULT NULL,
`status` tinyint(4) DEFAULT NULL,
`bignum` bigint(20) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
| mysql-bin.000001 | 815 | Gtid | 131066666 | 880 | SET @@SESSION.GTID_NEXT= 'b3b20ec1-f08c-11e9-b0c8-246e96c61f28:3' |
| mysql-bin.000001 | 880 | Query | 131066666 | 961 | BEGIN |
| mysql-bin.000001 | 961 | Rows_query | 131066666 | 1056 | # insert into t1(name,content,status,create_time)values('a','aa',1,now()) |
| mysql-bin.000001 | 1056 | Table_map | 131066666 | 1112 | table_id: 108 (mysia.t1) |
| mysql-bin.000001 | 1112 | Write_rows | 131066666 | 1163 | table_id: 108 flags: STMT_END_F |
| mysql-bin.000001 | 1163 | Xid | 131066666 | 1194 | COMMIT /* xid=18 */ |
+------------------+------+----------------+-----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
12 rows in set (0.00 sec)
插入了一条记录,利用show binlog events找到对应的event的起始点进行解析,先来看table_map
00000000 fe 62 69 6e 5f dd a7 5d 0f 2a eb cf 07 77 00 00 |.bin_..].*...w..|
00000010 00 7b 00 00 00 01 00 04 00 35 2e 37 2e 32 32 2d |.{.......5.7.22-|
00000020 6c 6f 67 00 00 00 00 00 00 00 00 00 00 00 00 00 |log.............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 00 00 00 5f dd a7 5d 13 |..........._..].|
00000050 38 0d 00 08 00 12 00 04 04 04 04 12 00 00 5f 00 |8............._.|
00000060 04 1a 08 00 00 00 08 08 08 02 00 00 00 0a 0a 0a |................|
00000070 2a 2a 00 12 34 00 01 7a 1e cd 82 5f dd a7 5d 23 |**..4..z..._..]#|
00000080 2a eb cf 07 1f 00 00 00 9a 00 00 00 80 00 00 00 |*...............|
00000090 00 00 00 00 00 00 12 3f a7 e7 a5 74 a9 5d 21 2a |.......?...t.]!*|
000000a0 eb cf 07 41 00 00 00 db 00 00 00 00 00 01 b3 b2 |...A............|
000000b0 0e c1 f0 8c 11 e9 b0 c8 24 6e 96 c6 1f 28 01 00 |........$n...(..|
000000c0 00 00 00 00 00 00 02 00 00 00 00 00 00 00 00 01 |................|
000000d0 00 00 00 00 00 00 00 e2 87 c8 8e a5 74 a9 5d 02 |............t.].|
000000e0 2a eb cf 07 61 00 00 00 3c 01 00 00 08 00 04 00 |*...a...<.......|
000000f0 00 00 00 00 00 00 05 00 00 22 00 00 00 00 00 00 |........."......|
00000100 01 00 00 00 00 00 00 00 00 06 03 73 74 64 04 21 |...........std.!|
00000110 00 21 00 2d 00 0c 01 6d 79 73 69 61 00 6d 79 73 |.!.-...mysia.mys|
00000120 69 61 00 63 72 65 61 74 65 20 64 61 74 61 62 61 |ia.create databa|
00000130 73 65 20 6d 79 73 69 61 a4 e6 33 d4 bc 74 a9 5d |se mysia..3..t.]|
00000140 21 2a eb cf 07 41 00 00 00 7d 01 00 00 00 00 01 |!*...A...}......|
00000150 b3 b2 0e c1 f0 8c 11 e9 b0 c8 24 6e 96 c6 1f 28 |..........$n...(|
00000160 02 00 00 00 00 00 00 00 02 01 00 00 00 00 00 00 |................|
00000170 00 02 00 00 00 00 00 00 00 c9 a0 b7 e9 bc 74 a9 |..............t.|
00000180 5d 02 2a eb cf 07 b2 01 00 00 2f 03 00 00 00 00 |].*......./.....|
00000190 04 00 00 00 00 00 00 00 05 00 00 24 00 00 00 00 |...........$....|
000001a0 00 00 01 00 00 00 00 00 00 00 00 06 03 73 74 64 |.............std|
000001b0 04 21 00 21 00 2d 00 0c 01 6d 79 73 69 61 00 10 |.!.!.-...mysia..|
000001c0 00 6d 79 73 69 61 00 43 52 45 41 54 45 20 54 41 |.mysia.CREATE TA|
000001d0 42 4c 45 20 60 74 31 60 20 28 0a 20 20 60 69 64 |BLE `t1` (. `id|
000001e0 60 20 69 6e 74 28 31 31 29 20 4e 4f 54 20 4e 55 |` int(11) NOT NU|
000001f0 4c 4c 20 41 55 54 4f 5f 49 4e 43 52 45 4d 45 4e |LL AUTO_INCREMEN|
00000200 54 2c 0a 20 20 60 6e 61 6d 65 60 20 76 61 72 63 |T,. `name` varc|
00000210 68 61 72 28 31 30 29 20 44 45 46 41 55 4c 54 20 |har(10) DEFAULT |
00000220 4e 55 4c 4c 2c 0a 20 20 60 63 6f 6e 74 65 6e 74 |NULL,. `content|
00000230 60 20 76 61 72 63 68 61 72 28 32 35 36 29 20 44 |` varchar(256) D|
00000240 45 46 41 55 4c 54 20 4e 55 4c 4c 2c 0a 20 20 60 |EFAULT NULL,. `|
00000250 73 74 61 74 75 73 60 20 74 69 6e 79 69 6e 74 28 |status` tinyint(|
00000260 34 29 20 44 45 46 41 55 4c 54 20 4e 55 4c 4c 2c |4) DEFAULT NULL,|
00000270 0a 20 20 60 62 69 67 6e 75 6d 60 20 62 69 67 69 |. `bignum` bigi|
00000280 6e 74 28 32 30 29 20 44 45 46 41 55 4c 54 20 4e |nt(20) DEFAULT N|
00000290 55 4c 4c 2c 0a 20 20 60 63 72 65 61 74 65 5f 74 |ULL,. `create_t|
000002a0 69 6d 65 60 20 74 69 6d 65 73 74 61 6d 70 20 4e |ime` timestamp N|
000002b0 4f 54 20 4e 55 4c 4c 20 44 45 46 41 55 4c 54 20 |OT NULL DEFAULT |
000002c0 43 55 52 52 45 4e 54 5f 54 49 4d 45 53 54 41 4d |CURRENT_TIMESTAM|
000002d0 50 20 4f 4e 20 55 50 44 41 54 45 20 43 55 52 52 |P ON UPDATE CURR|
000002e0 45 4e 54 5f 54 49 4d 45 53 54 41 4d 50 2c 0a 20 |ENT_TIMESTAMP,. |
000002f0 20 50 52 49 4d 41 52 59 20 4b 45 59 20 28 60 69 | PRIMARY KEY (`i|
00000300 64 60 29 0a 29 20 45 4e 47 49 4e 45 3d 49 6e 6e |d`).) ENGINE=Inn|
00000310 6f 44 42 20 44 45 46 41 55 4c 54 20 43 48 41 52 |oDB DEFAULT CHAR|
00000320 53 45 54 3d 75 74 66 38 6d 62 34 53 f7 80 ad f2 |SET=utf8mb4S....|
00000330 74 a9 5d 21 2a eb cf 07 41 00 00 00 70 03 00 00 |t.]!*...A...p...|
00000340 00 00 00 b3 b2 0e c1 f0 8c 11 e9 b0 c8 24 6e 96 |.............$n.|
00000350 c6 1f 28 03 00 00 00 00 00 00 00 02 02 00 00 00 |..(.............|
00000360 00 00 00 00 03 00 00 00 00 00 00 00 28 c5 9b 01 |............(...|
00000370 f2 74 a9 5d 02 2a eb cf 07 51 00 00 00 c1 03 00 |.t.].*...Q......|
00000380 00 08 00 04 00 00 00 00 00 00 00 05 00 00 22 00 |..............".|
00000390 00 00 00 00 00 01 00 00 00 00 00 00 00 00 06 03 |................|
000003a0 73 74 64 04 21 00 21 00 2d 00 05 06 53 59 53 54 |std.!.!.-...SYST|
000003b0 45 4d 6d 79 73 69 61 00 42 45 47 49 4e f6 13 71 |EMmysia.BEGIN..q|
000003c0 53 f2 74 a9 5d 1d 2a eb cf 07 5f 00 00 00 20 04 |S.t.].*..._... .|
000003d0 00 00 80 00 47 69 6e 73 65 72 74 20 69 6e 74 6f |....Ginsert into|
000003e0 20 74 31 28 6e 61 6d 65 2c 63 6f 6e 74 65 6e 74 | t1(name,content|
000003f0 2c 73 74 61 74 75 73 2c 63 72 65 61 74 65 5f 74 |,status,create_t|
00000400 69 6d 65 29 76 61 6c 75 65 73 28 27 61 27 2c 27 |ime)values('a','|
00000410 61 61 27 2c 31 2c 6e 6f 77 28 29 29 a9 7d 5c 9d |aa',1,now()).}\.|
00000420 f2 74 a9 5d 13 2a eb cf 07 38 00 00 00 58 04 00 |.t.].*...8...X..|
00000430 00 00 00 6c 00 00 00 00 00 01 00 05 6d 79 73 69 |...l........mysi|
00000440 61 00 02 74 31 00 06 03 0f 0f 01 08 11 05 28 00 |a..t1.........(.|
00000450 00 04 00 1e 08 9e 88 ba f2 74 a9 5d 1e 2a eb cf |.........t.].*..|
00000460 07 33 00 00 00 8b 04 00 00 00 00 6c 00 00 00 00 |.3.........l....|
00000470 00 01 00 02 00 06 ff d0 01 00 00 00 01 61 02 00 |.............a..|
00000480 61 61 01 5d a9 74 f2 42 c3 c5 2a f2 74 a9 5d 10 |aa.].t.B..*.t.].|
00000490 2a eb cf 07 1f 00 00 00 aa 04 00 00 00 00 12 00 |*...............|
000004a0 00 00 00 00 00 00 4d 31 45 e6 |......M1E.|
000004aa
Table_map_event的postion是从1056开始的,也就是1056的字节位开始,根据上面的介绍我们跳过19 bytes的event_header和8 bytes的fix par部分,然后的一个字节为0x05表示库名长度为5 bytes,接着的5 bytes+空结束符就是库名mysia,接着0x02表示表名长度为2 bytes,接着的2 bytes+空结束符为表名t1,紧接着的0x06表示表有6个字段,接着的6 bytes位分别是6个字段类型id,接着0x05是元数据占用5 bytes,从我们创建的表字段顺序来看,首先int、tinyint、bigint不占用字节位,varchar占用2 bytes,timesteamp占用1 bytes,所以这后面的5个子节点是对应name、content、timesteamp字段的元数据,0x0028是name字段元数据,长度为40,0x0400是content字段长度1024,最后的一个字节0x00表示timestamp没有毫秒的精度,最后面5 bytes读完该event就结束。
紧接着后面的event为write_rows_event是插入的数据所在,首先还是跳过event_header和fix par部分,也就是29个字节,接着的0x06表示有6个字段,接着的0xff的bit位表示各列是否都存在值,这里表示都存在,接下来的0xd0也是用bit位判断,不过这是判断各字段值是否为NULL,0表示null,1表示不为null, 首先4 bytes的0x00000001是自增主键id值2,接着是vhachar字段根据table_map得出元数据为10,没有超过255这里占用1 bytes即为0x01表示后面的字段内容只占用1 bytes,接着的0x61就是插入的’a’,接着就是content字段,根据元数据值得出256大于255所以这里占用2 bytes,0x0002数据内容占用2 bytes,后面的两字节就是内容,0x6161就是'aa',接着是status插入值0x01,紧接着的就是create_time字段值,因为bignum字段我们没有插入值,元数据中没有得出毫秒位精度,所以字节位是0x5da974f2,最后4 bytes的校验码该event结束。
一个事务产生的所有event会被GTID_LOG_EVENT和XID_EVENT包住,table_map和update/delete/write_rows_event的关系通过上面已大概清楚,官方文档中对数据存储及各个event结构都有详细介绍,有兴趣的可以参考官网。