MongoDB change stream ResumeToke

2021-07-14  本文已影响0人  Yellowtail

概述

近期我们的change stream 服务遇到了一个问题
报错如下

org.springframework.data.mongodb.UncategorizedMongoDbException: Command failed with error 10334 (BSONObjectTooLarge): 'BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }' on server 172.16.4.232:3717. The full response is {"operationTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "ok": 0.0, "errmsg": "BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }", "code": 10334, "codeName": "BSONObjectTooLarge", "clusterTime": {"clusterTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "signature": {"hash": {"binary": {"base64": "DzlqmKctlL/0Mbt6d6cV+KerDYE=", "subType": "00"}}, "keyId": 6944255543972200451}}}; nested exception is com.mongodb.MongoCommandException: Command failed with error 10334 (BSONObjectTooLarge): 'BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }' on server 172.16.4.232:3717. The full response is {"operationTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "ok": 0.0, "errmsg": "BSONObj size: 18360146 (0x1182752) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: { _data: "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004" }", "code": 10334, "codeName": "BSONObjectTooLarge", "clusterTime": {"clusterTime": {"timestamp": {"t": 1626089928, "i": 1235}}, "signature": {"hash": {"binary": {"base64": "DzlqmKctlL/0Mbt6d6cV+KerDYE=", "subType": "00"}}, "keyId": 6944255543972200451}}}

格式化一下


image.png

可以看到是因为拿到的数据大小超过 BSON 的 最大值 16M, 导致报错
oplogresume token8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004

那么到底是哪个文档出了问题呢?或者说是哪个 oplog 出了问题

因为 resume token 支持 resumeAfter startAfter 等 断点继续消费 的功能,所以猜测 resume token 肯定可以精准定位到 一个 或者 一小部分的 oplog,我只要解析出来这个信息,去找到对应的 oplog 就能知道到底出了什么问题

但是呢,从token本身信息看不出来
翻了翻官方文档,没说这个 resume token 格式是咋来的, 经过一番搜索和尝试,最后解析出来信息,并且定位到了,特此分享

0x1 文档

知乎看到一篇 MongoDB 4.2 内核解析 - Change Stream
里面描述如下

image.png

还在github找到了 4.2 版本源码, resume token

image.png

C 代码已经全部忘了,看不懂了。。。
然后又找到了一篇 format

image.png

看起来比较靠谱,对照着手动解析一下,发现可行

0x2 格式

格式如下


MommyTalk1626246926776.png

LaTex在线

写出正则表达式

^82(.{8}).+5A1004(.{32})46645F69640064(.{24})0004$
image.png

捕获到的三个分组分别是

下面讲一讲正则表达式的由来

0x3 时间戳

看一下 oplog 里的数据

image.png

数字1626244435 转为16进制就是 60EE8553, 所以时间戳在 resume token 里长度为8

0x4 ObjectId

这个不用多说, 24位

0x5 uuid

参考 stackoverflow 的那个回答,从前面的 82 到 8位时间戳,能对应起来
再往后就对不上了
这时就从后往前匹配,发现可以

0004 固定结尾
然后就是24位 ObjectId
然后就是一堆固定的 ObjectId BSON 格式前缀 46645F69640064
再往前就是 UUID,但是不知道有没有经过什么转换,暂时跳过
再往前就是 固定的 UUID BSON 格式前缀 5A1004
经过手动测量,发现 UUID 长度为 32, 只是把间隔符- 去掉了而已
按照长度 8 4 4 4 12 的间隔加上-,就可以把 oplogui 字段还原出来

0x6 Java 代码

贴上可以运行的Java代码

运行结果


image.png

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * @author YellowTail
 * @since 2021-07-12
 */
public class ResumeTokenTest {

    public static void main(String[] args) {

        String token = "8260EBB7E9000009932B022C0100296E5A100456CA186C15614BC48E761990EE38781E46645F696400645E953035E1031700014771360004";

        analysisToken(token);
    }

    private static void analysisToken(String value) {

        // 解析参考文档为 https://stackoverflow.com/a/54072030

        if (null == value) {
            return;
        }

        Pattern pattern = Pattern.compile("^82(.{8}).+5A1004(.{32})46645F69640064(.{24})0004$");

        Matcher matcher = pattern.matcher(value);

        if (! matcher.find()) {
            System.out.println("格式不匹配");
            return;
        }

        String timestampHexStr = matcher.group(1);
        String uuidStr = matcher.group(2);
        String oidStr = matcher.group(3);

        printTime(timestampHexStr);

        printUi(uuidStr);

        printObjectId(oidStr);

    }

    private static void printObjectId(final String oidStr) {
        System.out.println("ObjectId 是 " + oidStr);
    }

    private static void printUi(final String uuidStr) {
        // 长度分隔为 8 4 4 4 12

        Pattern pattern = Pattern.compile("^(.{8})(.{4})(.{4})(.{4})(.{12})$");
        Matcher matcher = pattern.matcher(uuidStr);
        if (! matcher.find()) {
            System.out.println("uuid 格式不匹配");
            return;
        }

        String str1 = matcher.group(1);
        String str2 = matcher.group(2);
        String str3 = matcher.group(3);
        String str4 = matcher.group(4);
        String str5 = matcher.group(5);

        String join = String.join("-", str1, str2, str3, str4, str5);

        System.out.println("UUID是 " + uuidStr + ", ui是 " + join);
    }

    private static String sub(String value, int len) {
        return value.substring(0, len);
    }

    private static void printTime(final String timestampHexStr) {
        long timeStamp = hexStr2Long(timestampHexStr);

        Date date = new Date(timeStamp * 1000);
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm");

        System.out.println("时间戳是 " + timeStamp + ", 时间是 " + sdf.format(date));
    }

    // 16进制字符串转long 
    private static long hexStr2Long(String str) {

        char[] charArray = str.toCharArray();

        int length = str.length();

        long count = 0;

        for (int i = 0; i < charArray.length; i++) {

            int anInt = Integer.parseInt(String.valueOf(charArray[i]), 16);

            int offset = (length - 1 - i ) * 4;

            long tmp = anInt << offset;

            count += tmp;
        }

        return count;
    }

}

上一篇 下一篇

猜你喜欢

热点阅读