MongoDB极简教程 · Mongo · NoSQL

MongoDB源码:FieldRef

2020-11-24  本文已影响0人  江海小流

相关的类

在阅读 MatchExpression 时,出现很多的 FieldRef 的使用,因此,决定先了解 FieldRef 的实现。

简要概要

在 mongo 的源码中,FieldRef 的实现中有如下的注释:

A FieldPath represents a path in a document, starting from the root. The path
is made of "field parts" separated by dots. The class provides an efficient means to
"split" the dotted fields in its parts, but no validation is done.

Any field part may be replaced, after the "original" field reference was parsed. Any
part can be accessed through a StringData object.

The class is not thread safe.

其中,FieldRef 用于表示文档中的路径(注释中写的是 FieldPath,可能是写错了),如 {"a": {"b": 3}} 中的 "a.b",它有两个part,分别是 abFieldRef 支持高效的完成如下工作:

在单元测试中能够佐证这些用途,如:

TEST(Normal, MulitplePartsVariable) {
    const char* parts[] = {"a", "b", "c", "d", "e"};
    size_t size = sizeof(parts) / sizeof(char*);
    std::string field = "a.b.c.d.e";

    FieldRef fieldRef(field);
    ASSERT_EQUALS(fieldRef.numParts(), size);
    for (size_t i = 0; i < size; i++) { // the field is splitted by dot
        ASSERT_EQUALS(fieldRef.getPart(i), parts[i]);
    }
    ASSERT_EQUALS(fieldRef.dottedField(), field);
}
TEST(Replacement, InMultipleField) {
    std::string field = "a.b.c.$.e";
    FieldRef fieldRef(field);
    ASSERT_EQUALS(fieldRef.numParts(), 5U);
    ASSERT_EQUALS(fieldRef.getPart(3), "$");

    std::string newField = "d";
    fieldRef.setPart(3, newField);
    ASSERT_EQUALS(fieldRef.numParts(), 5U);
    ASSERT_EQUALS(fieldRef.getPart(3), newField);
    ASSERT_EQUALS(fieldRef.dottedField(), "a.b.c.d.e");
}

成员变量

在了解了 FieldRef 的大致功能后,我们看一下 FieldRef 使用了那些成员变量,存储哪些信息。

class FieldRef {
private:
    // Number of field parts in the cached dotted name (_dotted).
    mutable FieldIndex _cachedSize = 0u;

    // Field components. Each component is either a StringView backed by the
    // _dotted string or boost::none to indicate that getPart() should read the string from the
    // _replacements list.
    mutable boost::container::small_vector<boost::optional<StringView>, kFewDottedFieldParts>
        _parts;

    /**
     * Cached copy of the complete dotted name string. The StringView objects in "_parts" reference
     * this string.
     */
    mutable std::string _dotted;

    /**
     * String storage for path parts that have been replaced with setPart() or added with
     * appendPart() since the lasted time "_dotted" was materialized.
     */
    mutable std::vector<std::string> _replacements;
};

上述的几个字段是存在冗余信息的,那么我们针对这些字段之间的关系,分析这样设计背后的逻辑。

字段之间的关系

为了了解这些字段之间的关系,因此需要了解与这些字段相关的函数。这些函数有:

appendParsedPart

appendParsedPart 的实现是怎样的?

size_t FieldRef::appendParsedPart(FieldRef::StringView part) {
    _parts.push_back(part);
    _cachedSize++;
    return _parts.size();
}

该函数输入一个 StringView,将其 push_back_parts 内,并同时更新 _cachedSize。该函数是私有的,仅由 parse 函数调用,parse 函数的实现如下:

void FieldRef::parse(StringData path) {
    clear(); // ...; _cachedSize = 0; ...;

    if (path.size() == 0) {
        return;
    }

    // We guarantee that accesses through getPart() will be valid while 'this' is. So we
    // keep a copy in a local sting.

    _dotted = path.toString();

    // Separate the field parts using '.' as a delimiter.
    std::string::iterator beg = _dotted.begin();
    std::string::iterator cur = beg;
    const std::string::iterator end = _dotted.end();
    while (true) {
        if (cur != end && *cur != '.') {
            cur++;
            continue;
        }
        // 此时 beg 和 cur 表示的区间对应的字符串不包含 '.'

        // If cur != beg then we advanced cur in the loop above, so we have a real sequence
        // of characters to add as a new part. Otherwise, we may be parsing something odd,
        // like "..", and we need to add an empty StringData piece to represent the "part"
        // in-between the dots. This also handles the case where 'beg' and 'cur' are both
        // at 'end', which can happen if we are parsing anything with a terminal "."
        // character. In that case, we still need to add an empty part, but we will break
        // out of the loop below since we will not execute the guarded 'continue' and will
        // instead reach the break statement.

        if (cur != beg) {
            size_t offset = beg - _dotted.begin();
            size_t len = cur - beg;
            appendParsedPart(StringView{offset, len});
        } else {
            appendParsedPart(StringView{});
        }

        if (cur != end) {
            beg = ++cur;
            continue;
        }

        break;
    }
}

parse 函数基于 . 拆分 _dotted 字符串,拆分的结果使用 StringView 存储,StringView 的实现如下。StringView 不会保存字符串的地址,仅包含 offsetlen,这样可以减少一个 sizeof(char*) 的存储空间。

struct StringView {
    // Constructs an empty StringView.
    StringView() = default;

    StringView(std::size_t offset, std::size_t len) : offset(offset), len(len){};

    StringData toStringData(const std::string& viewInto) const {
        return {viewInto.c_str() + offset, len};
    };

    std::size_t offset = 0;
    std::size_t len = 0;
};

由此,在调用 parse 函数后,满足一些关系:

appendPart

为了了解 appendPart 的实现,我们先看该函数的代码:

void FieldRef::appendPart(StringData part) {
    if (_replacements.empty()) {
        _replacements.resize(_parts.size());
    }

    _replacements.push_back(part.toString());
    _parts.push_back(boost::none);
}

_replacements 为空是可能的(在 parse 调用后),此时对 _replacements 调用 resize 函数,使其大小与 _parts 的大小一致,但是并没有为 _replacements 内的各个元素赋值,因此其内部的这些元素皆为空字符串。

因此,通过调用 appendPart,满足:

setPart

setPart 的代码如下:

void FieldRef::setPart(FieldIndex i, StringData part) {
    dassert(i < _parts.size());

    if (_replacements.empty()) {
        _replacements.resize(_parts.size());
    }

    _replacements[i] = part.toString();
    _parts[i] = boost::none;
}

appendPart 类似的,同步 resize _replacements 的大小。此外,修改 FieldRef 中的某一个 Part,会将 _replacemnets 对应的下标赋值成修改后的值,而将原 _parts 对应下标的值设置成 boost::none,表示此时以 _replacements 中的值为准。

因此,通过调用 setPart,同样满足:

reserialize

reserialize 的实现如下:

void FieldRef::reserialize() const {
    auto parts = _parts.size();
    std::string nextDotted;
    // Reserve some space in the string. We know we will have, at minimum, a character for
    // each component we are writing, and a dot for each component, less one. We don't want
    // to reserve more, since we don't want to forfeit the SSO if it is applicable.
    nextDotted.reserve((parts > 0) ? (parts * 2) - 1 : 0);

    // Concatenate the fields to a new string
    for (size_t i = 0; i != _parts.size(); ++i) {
        if (i > 0)
            nextDotted.append(1, '.');
        const StringData part = getPart(i);
        nextDotted.append(part.rawData(), part.size());
    }

    // Make the new string our contents
    _dotted.swap(nextDotted);

    // Before we reserialize, it's possible that _cachedSize != _size because parts were added or
    // removed. This reserialization process reconciles the components in our cached string
    // (_dotted) with the modified path.
    _cachedSize = parts;

    // Fixup the parts to refer to the new string
    std::string::const_iterator where = _dotted.begin();
    const std::string::const_iterator end = _dotted.end();
    for (size_t i = 0; i != parts; ++i) {
        boost::optional<StringView>& part = _parts[i];
        const size_t size = part ? part->len : _replacements[i].size();

        // There is one case where we expect to see the "where" iterator to be at "end" here: we
        // are at the last part of the FieldRef and that part is the empty string. In that case, we
        // need to make sure we do not dereference the "where" iterator.
        invariant(where != end || (size == 0 && i == parts - 1));
        if (!size) {
            part = StringView{};
        } else {
            std::size_t offset = where - _dotted.begin();
            part = StringView{offset, size};
        }
        where += size;
        // skip over '.' unless we are at the end.
        if (where != end) {
            dassert(*where == '.');
            ++where;
        }
    }

    // Drop any replacements
    _replacements.clear();
}

该函数重新计算了 _parts_dotted,并将 _cachedSize 修改成对应的数值,然后将 _replacements 的元素清空。

因此,总的来说,我们可以得到如下规律:

所以,_replacements 类似于一个缓冲区,用于缓存修改。

总结

这样设计有如下优点:

上一篇 下一篇

猜你喜欢

热点阅读