LevelDB arena 内存管理

2021-04-05  本文已影响0人  wayyyy

这个内存分配器并不是为整个 LevelDB 项目考虑的。主要是为 skiplist 也就是 memtable 服务。skiplist 里面记录的是用户传进来的key/value,这些字符串有长有短,放到内存中的时候,很容易导致内存碎片。所以这里写了一个统一的内存管理器。

arena 以 分配内存 N 字节为一个block,将这些 block 放在一个 vector 中。N为多少,见下面分析。

image.png
class Arena
{
public:
    char *Allocate(size_t bytes);
    char *AllocateAligned(size_t bytes);

private:
    char *AllocateFallback(size_t bytes);
    char *AllocateNewBlock(size_t block_bytes);

private:
    char *alloc_ptr_;                    // 指向当前块中剩余的内存起点
    size_t alloc_bytes_remaining_;       // 当前块中剩余的内存
    std::vector<char *> blocks_;
    std::atomic<size_t> memory_usage_;   // 申请的内存总量
};
处理字节对齐
char *Arena::AllocateAligned(size_t bytes)
{
    const int align = (sizeof(void *) > 8) ? sizeof(void *) : 8;
    static_assert((align & (align - 1)) == 0, "Pointer size should be a power of 2");
    size_t current_mod = reinterpret_cast<uintptr_t>(alloc_ptr_) & (align - 1);
    size_t slop = (current_mod == 0 ? 0 : align - current_mod);
    size_t needed = bytes + slop;
    char *result;
    if (needed <= alloc_bytes_remaining_)
    {
        result = alloc_ptr_ + slop;
        alloc_ptr_ += needed;
        alloc_bytes_remaining_ -= needed;
    }
    else
    {
            // AllocateFallback always returned aligned memory
            result = AllocateFallback(bytes);
    }
        assert((reinterpret_cast<uintptr_t>(result) & (align - 1)) == 0);
        return result;
}

下面逐行代码分析字节对齐。

const int align = (sizeof(void *) > 8) ? sizeof(void *) : 8;
static_assert((align & (align - 1)) == 0, "Pointer size should be a power of 2");

首先确认按多少字节对齐。必须保证对齐单位必须能放下一个完整的指针(void* ),如果指针的体积小于8,就按8字节对齐。下面都已8字节为例:

size_t current_mod = reinterpret_cast<uintptr_t>(alloc_ptr_) & (align - 1);
size_t slop = (current_mod == 0 ? 0 : align - current_mod);
size_t needed = bytes + slop;

(alloc_ptr_) & (align - 1)是为了计算零头有多少,等同于 alloc_ptr % align,不过当 align 为 2 的幂次时,可以用位运算来替代取模运算,以当前为分配起始地址为108为例:

\begin {array}{cccccccccc} &1&1&0&1&&1&1&0&0\\ \& &0&0&0&0&&0&1&1&1\\ \hline &0&0&0&0&&0&1&0&0 \end {array}
计算出来的零头为4。

image.png

slop表示后零头,实际分配内存时会将bytes + slop分配出去,实现内存对齐。

char *result;
if (needed <= alloc_bytes_remaining_)
{
    result = alloc_ptr_ + slop;   // 这里返回给客户的内存起始地址是8字节对齐的
    alloc_ptr_ += needed;
    alloc_bytes_remaining_ -= needed;
}
else
{
    result = AllocateFallback(bytes);
}
    assert((reinterpret_cast<uintptr_t>(result) & (align - 1)) == 0);
return result;

AllocateFallback 是对AllocateNewBlock的封装,并对分配策略做了优化:

char *Arena::AllocateFallback(size_t bytes)
{
    // 如果要申请的bytes < 4096/4,那么直接申请一个bytes的block
    if (bytes > kBlockSize / 4)
    {
        char *result = AllocateNewBlock(bytes);
        return result;
    }

    // 否则申请4096单位的block
    alloc_ptr_ = AllocateNewBlock(kBlockSize);
    alloc_bytes_remaining_ = kBlockSize;

    char *result = alloc_ptr_;
    alloc_ptr_ += bytes;
    alloc_bytes_remaining_ -= bytes;
    return result;
}

char *Arena::AllocateNewBlock(size_t block_bytes)
{
        char *result = new char[block_bytes];
        blocks_.push_back(result);
        memory_usage_.fetch_add(block_bytes + sizeof(char *),
                                std::memory_order_relaxed);
        return result;
}

Allocate 就不分析了。

这里做一下简单的性能测试,测试其内存分配效率和减少内存碎片优势,测试代码:

#include "arena.h"
#include <chrono>
#include <string>
#include "iostream"
using namespace std;

unsigned long long CNT = 0;

void test_new_delete()
{
    auto start = chrono::high_resolution_clock::now();
    for (unsigned long long i = 0; i < CNT; i++)
    {
        // 分配长度在10 ~ 256 字节长度
        int n = rand() % 246 + 10;
        ::operator new(n); // operator new 只是分配内存
    }
    auto end = chrono::high_resolution_clock::now();

    cout << "test_new_delete: "
         << "CNT = " << CNT << ", spend time: "
         << (end - start).count() << endl;
}

void test_arena()
{
    Arena ar;
    auto start = chrono::high_resolution_clock::now();
    for (unsigned long long i = 0; i < CNT; i++)
    {
        // 分配长度在10 ~ 256 字节长度
        int n = rand() % 246 + 10;
        ar.AllocateAligned(n);
    }
    auto end = chrono::high_resolution_clock::now();

    cout << "test_arena: "
         << "CNT = " << CNT << ", spend time: "
         << (end - start).count() << endl;
}

int main()
{
    CNT = 1000;
    test_new_delete();
    test_arena();

    cout << "--------------" << endl;

    CNT = 10000;
    test_new_delete();
    test_arena();

    cout << "--------------" << endl;

    CNT = 100000;
    test_new_delete();
    test_arena();

    cout << "--------------" << endl;

    CNT = 1000000;
    test_new_delete();
    test_arena();

    cout << "--------------" << endl;

    CNT = 2000000;
    test_new_delete();
    test_arena();
}

g++ main.cpp arena.cpp -O3 && ./a.out

image.png

看出,还是有明显的性能提升。

上一篇下一篇

猜你喜欢

热点阅读