MongoDB change stream ResumeToke
概述
近期我们的change stream
服务遇到了一个问题
报错如下
org.springframework.data.mongodb.UncategorizedMongoDbException: Command failed with error 10334 (BSONObjectTooLarge): 'BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }' on server 172.16.4.232:3717. The full response is {"operationTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "ok": 0.0, "errmsg": "BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }", "code": 10334, "codeName": "BSONObjectTooLarge", "clusterTime": {"clusterTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "signature": {"hash": {"binary": {"base64": "DzlqmKctlL/0Mbt6d6cV+KerDYE=", "subType": "00"}}, "keyId": 6944255543972200451}}}; nested exception is com.mongodb.MongoCommandException: Command failed with error 10334 (BSONObjectTooLarge): 'BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }' on server 172.16.4.232:3717. The full response is {"operationTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "ok": 0.0, "errmsg": "BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }", "code": 10334, "codeName": "BSONObjectTooLarge", "clusterTime": {"clusterTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "signature": {"hash": {"binary": {"base64": "DzlqmKctlL/0Mbt6d6cV+KerDYE=", "subType": "00"}}, "keyId": 6944255543972200451}}}
格式化一下
image.png
可以看到是因为拿到的数据大小超过 BSON 的 最大值 16M
, 导致报错
且 oplog
的 resume token
是 8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004
那么到底是哪个文档出了问题呢?或者说是哪个 oplog
出了问题
因为 resume token
支持 resumeAfter
startAfter
等 断点继续消费 的功能,所以猜测 resume token
肯定可以精准定位到 一个 或者 一小部分的 oplog,我只要解析出来这个信息,去找到对应的 oplog
就能知道到底出了什么问题
但是呢,从token本身信息看不出来
翻了翻官方文档,没说这个 resume token
格式是咋来的, 经过一番搜索和尝试,最后解析出来信息,并且定位到了,特此分享
0x1 文档
知乎看到一篇 MongoDB 4.2 内核解析 - Change Stream
里面描述如下
还在github找到了 4.2 版本源码, resume token
C 代码已经全部忘了,看不懂了。。。
然后又找到了一篇 format
看起来比较靠谱,对照着手动解析一下,发现可行
0x2 格式
格式如下
MommyTalk1626246926776.png
写出正则表达式
^82(.{8}).+5A1004(.{32})46645F69640064(.{24})0004$
image.png
捕获到的三个分组分别是
- 时间戳(长度8)
- uuid(长度32)
- ObjectId(长度24)
下面讲一讲正则表达式的由来
0x3 时间戳
看一下 oplog
里的数据
数字1626244435
转为16进制就是 60EE8553
, 所以时间戳在 resume token
里长度为8
0x4 ObjectId
这个不用多说, 24位
0x5 uuid
参考 stackoverflow 的那个回答,从前面的 82
到 8位时间戳,能对应起来
再往后就对不上了
这时就从后往前匹配,发现可以
0004
固定结尾
然后就是24位 ObjectId
然后就是一堆固定的 ObjectId BSON 格式前缀 46645F69640064
再往前就是 UUID
,但是不知道有没有经过什么转换,暂时跳过
再往前就是 固定的 UUID BSON 格式前缀 5A1004
经过手动测量,发现 UUID 长度为 32, 只是把间隔符-
去掉了而已
按照长度 8 4 4 4 12
的间隔加上-
,就可以把 oplog
的 ui
字段还原出来
0x6 Java 代码
贴上可以运行的Java
代码
运行结果
image.png
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* @author YellowTail
* @since 2021-07-12
*/
public class ResumeTokenTest {
public static void main(String[] args) {
String token = "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004";
analysisToken(token);
}
private static void analysisToken(String value) {
// 解析参考文档为 https://stackoverflow.com/a/54072030
if (null == value) {
return;
}
Pattern pattern = Pattern.compile("^82(.{8}).+5A1004(.{32})46645F69640064(.{24})0004$");
Matcher matcher = pattern.matcher(value);
if (! matcher.find()) {
System.out.println("格式不匹配");
return;
}
String timestampHexStr = matcher.group(1);
String uuidStr = matcher.group(2);
String oidStr = matcher.group(3);
printTime(timestampHexStr);
printUi(uuidStr);
printObjectId(oidStr);
}
private static void printObjectId(final String oidStr) {
System.out.println("ObjectId 是 " + oidStr);
}
private static void printUi(final String uuidStr) {
// 长度分隔为 8 4 4 4 12
Pattern pattern = Pattern.compile("^(.{8})(.{4})(.{4})(.{4})(.{12})$");
Matcher matcher = pattern.matcher(uuidStr);
if (! matcher.find()) {
System.out.println("uuid 格式不匹配");
return;
}
String str1 = matcher.group(1);
String str2 = matcher.group(2);
String str3 = matcher.group(3);
String str4 = matcher.group(4);
String str5 = matcher.group(5);
String join = String.join("-", str1, str2, str3, str4, str5);
System.out.println("UUID是 " + uuidStr + ", ui是 " + join);
}
private static String sub(String value, int len) {
return value.substring(0, len);
}
private static void printTime(final String timestampHexStr) {
long timeStamp = hexStr2Long(timestampHexStr);
Date date = new Date(timeStamp * 1000);
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm");
System.out.println("时间戳是 " + timeStamp + ", 时间是 " + sdf.format(date));
}
// 16进制字符串转long
private static long hexStr2Long(String str) {
char[] charArray = str.toCharArray();
int length = str.length();
long count = 0;
for (int i = 0; i < charArray.length; i++) {
int anInt = Integer.parseInt(String.valueOf(charArray[i]), 16);
int offset = (length - 1 - i ) * 4;
long tmp = anInt << offset;
count += tmp;
}
return count;
}
}