C++ 自定义 `std::allocator`：实现高效、低碎片、线程安全的内存池

哈喽，各位好！今天我们来聊聊一个有点硬核，但绝对能让你在内存管理上更上一层楼的话题：C++自定义std::allocator，以及如何用它来实现一个高效、低碎片、线程安全的内存池。

为什么我们需要自定义Allocator？

C++标准库自带的std::allocator虽然方便，但在性能和资源控制上往往不够灵活。特别是在高性能应用、游戏开发、嵌入式系统等领域，默认的std::allocator可能会成为性能瓶颈。原因如下：

通用性带来的低效: std::allocator需要处理各种类型的内存分配请求，因此其实现往往比较通用，牺牲了一些特定场景下的优化空间。
碎片问题: 频繁的分配和释放小块内存会导致内存碎片，降低内存利用率，甚至引发性能问题。
线程安全: 默认的std::allocator在多线程环境下可能需要额外的同步开销，影响并发性能。

因此，自定义std::allocator，针对特定场景进行优化，可以显著提升程序的性能和资源利用率。

自定义Allocator的基本结构

要实现一个自定义Allocator，我们需要遵循std::allocator的要求，主要包括以下几个关键部分：

类型定义:
- value_type: 分配的对象的类型。
- pointer: 指向分配对象的指针类型。
- const_pointer: 指向分配对象的常量指针类型。
- reference: 分配对象的引用类型。
- const_reference: 分配对象的常量引用类型。
- size_type: 用于表示大小的无符号整数类型。
- difference_type: 用于表示两个指针之间距离的有符号整数类型。
- propagate_on_container_copy_assignment: 指示在容器复制赋值时是否传播 allocator。通常设置为 std::true_type 或 std::false_type。
- propagate_on_container_move_assignment: 指示在容器移动赋值时是否传播 allocator。
- propagate_on_container_swap: 指示在容器交换时是否传播 allocator。
- is_always_equal: 指示 allocator 的不同实例是否总是相等。通常设置为 std::true_type 或 std::false_type。
构造函数/析构函数: 至少需要一个默认构造函数和一个复制构造函数。
allocate() 方法: 负责分配指定大小的内存。
deallocate() 方法: 负责释放之前分配的内存。
construct() 方法 (C++11起): 在已分配的内存中构造对象。通常使用 placement new 实现。
destroy() 方法 (C++11起): 销毁已分配内存中的对象。
max_size() 方法 (可选): 返回 allocator 能分配的最大对象数量。
operator== 和 operator!=: 用于比较两个 allocator 是否相等。大多数情况下，allocator实例都是相等的，特别是对于状态无关的allocator。

一个简单的示例：基于new和delete的Allocator

#include <iostream>
#include <memory>
#include <new> // For placement new

template <typename T>
class SimpleAllocator {
public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;
    using propagate_on_container_copy_assignment = std::true_type;
    using propagate_on_container_move_assignment = std::true_type;
    using propagate_on_container_swap = std::true_type;
    using is_always_equal = std::true_type;

    SimpleAllocator() noexcept {}

    template <typename U>
    SimpleAllocator(const SimpleAllocator<U>&) noexcept {}

    pointer allocate(size_type n) {
        if (n > std::numeric_limits<size_type>::max() / sizeof(T)) {
            throw std::bad_alloc();
        }
        pointer p = static_cast<pointer>(::operator new(n * sizeof(T)));
        if (!p) {
            throw std::bad_alloc();
        }
        std::cout << "Allocated " << n * sizeof(T) << " bytes using SimpleAllocator." << std::endl;
        return p;
    }

    void deallocate(pointer p, size_type n) {
        std::cout << "Deallocated " << n * sizeof(T) << " bytes using SimpleAllocator." << std::endl;
        ::operator delete(p);
    }

    template <typename U, typename... Args>
    void construct(U* p, Args&&... args) {
        new (p) U(std::forward<Args>(args)...); // Placement new
    }

    template <typename U>
    void destroy(U* p) {
        p->~U();
    }
};

template <typename T, typename U>
bool operator==(const SimpleAllocator<T>&, const SimpleAllocator<U>&) noexcept {
    return true;
}

template <typename T, typename U>
bool operator!=(const SimpleAllocator<T>&, const SimpleAllocator<U>&) noexcept {
    return false;
}

int main() {
    std::vector<int, SimpleAllocator<int>> myVector(10, SimpleAllocator<int>());
    for (int i = 0; i < 10; ++i) {
        myVector[i] = i * 2;
    }

    for (int i = 0; i < 10; ++i) {
        std::cout << myVector[i] << " ";
    }
    std::cout << std::endl;

    return 0;
}

这个例子很简单，只是封装了new和delete。但它展示了自定义Allocator的基本结构。

实现一个高效的内存池Allocator

现在，我们来创建一个更高级的Allocator，它使用内存池来管理内存，从而提高性能并减少碎片。

内存池的基本原理

内存池预先分配一大块连续的内存，然后将这块内存分割成固定大小的块。当需要分配内存时，从内存池中取出一个空闲块；当释放内存时，将块放回内存池。

内存池Allocator的实现

#include <iostream>
#include <memory>
#include <vector>
#include <mutex> // For thread safety

template <typename T>
class PoolAllocator {
public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;
    using propagate_on_container_copy_assignment = std::true_type;
    using propagate_on_container_move_assignment = std::true_type;
    using propagate_on_container_swap = std::true_type;
    using is_always_equal = std::true_type;

    PoolAllocator(size_type poolSize = 1024) : poolSize_(poolSize), freeBlocks_(nullptr) {
        // Allocate the memory pool
        memoryPool_ = static_cast<char*>(::operator new(poolSize_ * sizeof(T)));

        // Divide the pool into blocks and link them together
        for (size_type i = 0; i < poolSize_; ++i) {
            T* block = reinterpret_cast<T*>(memoryPool_ + i * sizeof(T));
            block->~T(); // Ensure block is in a raw state.  Placement new will construct objects into these blocks.
            *reinterpret_cast<T**>(block) = freeBlocks_; // Store the pointer to next block in current block
            freeBlocks_ = block;
        }
    }

    ~PoolAllocator() {
        // Deallocate the memory pool
        if (memoryPool_) {
            ::operator delete(memoryPool_);
        }
        memoryPool_ = nullptr;
        freeBlocks_ = nullptr;
    }

    PoolAllocator(const PoolAllocator& other) noexcept : poolSize_(other.poolSize_), memoryPool_(nullptr), freeBlocks_(nullptr) {
        // Copy constructor should not create a new pool; rather share the same pool.
        // This helps avoid allocating multiple pools when containers are copied.
        // In a real-world scenario, you might want to consider reference counting for pool ownership.
    }

    template <typename U>
    PoolAllocator(const PoolAllocator<U>& other) noexcept : poolSize_(other.poolSize_), memoryPool_(nullptr), freeBlocks_(nullptr) {
        // Same as copy constructor.
    }

    pointer allocate(size_type n) {
        if (n != 1) {
            throw std::bad_alloc(); // PoolAllocator only supports allocating single objects
        }

        std::lock_guard<std::mutex> lock(mutex_); // Thread safety

        if (!freeBlocks_) {
            std::cerr << "PoolAllocator is out of memory!" << std::endl;
            throw std::bad_alloc();
        }

        pointer block = freeBlocks_;
        freeBlocks_ = *reinterpret_cast<T**>(freeBlocks_); // Update freeBlocks_ to point to the next block.
        return block;
    }

    void deallocate(pointer p, size_type n) {
        if (n != 1) return; // PoolAllocator only supports deallocating single objects

        std::lock_guard<std::mutex> lock(mutex_); // Thread safety

        *reinterpret_cast<T**>(p) = freeBlocks_; // Store the current freeBlocks_ in the freed block
        freeBlocks_ = p; // Update freeBlocks_ to point to the freed block
    }

    template <typename U, typename... Args>
    void construct(U* p, Args&&... args) {
        new (p) U(std::forward<Args>(args)...); // Placement new
    }

    template <typename U>
    void destroy(U* p) {
        p->~U();
    }

private:
    size_type poolSize_;
    char* memoryPool_;
    T* freeBlocks_;
    std::mutex mutex_; // For thread safety
};

template <typename T, typename U>
bool operator==(const PoolAllocator<T>&, const PoolAllocator<U>&) noexcept {
    return true;
}

template <typename T, typename U>
bool operator!=(const PoolAllocator<T>&, const PoolAllocator<U>&) noexcept {
    return false;
}

int main() {
    PoolAllocator<int> myAllocator(256); // Pool of 256 integers

    std::vector<int, PoolAllocator<int>> myVector(myAllocator); // Use the pool allocator
    myVector.reserve(256); // Reserve space for 256 integers

    for (int i = 0; i < 256; ++i) {
        myVector.push_back(i * 2);
    }

    for (int i = 0; i < 256; ++i) {
        std::cout << myVector[i] << " ";
    }
    std::cout << std::endl;

    myVector.clear();

    std::vector<std::string, PoolAllocator<std::string>> stringVector(myAllocator);
    stringVector.reserve(10);
    for (int i = 0; i < 10; ++i) {
        stringVector.emplace_back("String " + std::to_string(i));
    }

    for (const auto& str : stringVector) {
        std::cout << str << " ";
    }
    std::cout << std::endl;

    return 0;
}

代码解释

内存池分配: 在构造函数中，我们分配了一块大的内存memoryPool_，并将其分割成多个大小为sizeof(T)的块。
空闲块链表: 使用freeBlocks_指针维护一个空闲块链表。每个块的前几个字节（实际上就是T*的大小）用来存储指向下一个空闲块的指针。
allocate(): 从freeBlocks_链表中取出一个块，并更新freeBlocks_指针。这里使用了std::lock_guard来保证线程安全。
deallocate(): 将释放的块加入到freeBlocks_链表的头部。同样，使用了std::lock_guard保证线程安全。
Placement new: 在construct()方法中使用 placement new 在已分配的内存中构造对象。
线程安全: 使用 std::mutex 来保护内存池的并发访问。std::lock_guard 在进入和离开 allocate() 和 deallocate() 方法时自动加锁和解锁，确保线程安全。

优化方向

块大小: 可以根据实际需求调整块的大小。如果需要分配不同大小的对象，可以实现多个内存池，每个内存池管理特定大小的块。
内存池大小: 可以根据程序运行时的内存需求动态调整内存池的大小。
多线程支持: 可以采用更高级的并发控制技术，例如lock-free数据结构，来进一步提高多线程环境下的性能。
NUMA感知: 在NUMA架构下，可以根据线程的亲和性，将内存池分配到不同的NUMA节点上，减少跨节点访问的延迟。
调试支持: 添加调试功能，例如内存泄漏检测，可以帮助开发者更好地管理内存。

更高级的优化：使用多个内存池 + 对象池

为了更灵活地处理不同大小的内存请求，我们可以使用多个内存池，每个内存池负责分配特定大小的块。此外，我们可以结合对象池技术，预先创建一些对象，并将其保存在内存池中，从而减少对象创建和销毁的开销。

#include <iostream>
#include <memory>
#include <vector>
#include <mutex>
#include <map>

class SizeClassAllocator {
public:
    SizeClassAllocator(size_t blockSize, size_t poolSize) : blockSize_(blockSize), poolSize_(poolSize), freeList_(nullptr) {
        memoryPool_ = ::operator new(blockSize_ * poolSize_);
        char* current = static_cast<char*>(memoryPool_);

        for (size_t i = 0; i < poolSize_; ++i) {
            *reinterpret_cast<void**>(current) = freeList_;
            freeList_ = current;
            current += blockSize_;
        }
    }

    ~SizeClassAllocator() {
        ::operator delete(memoryPool_);
    }

    void* Allocate() {
        std::lock_guard<std::mutex> lock(mutex_);
        if (!freeList_) return nullptr;

        void* block = freeList_;
        freeList_ = *reinterpret_cast<void**>(freeList_);
        return block;
    }

    void Deallocate(void* block) {
        std::lock_guard<std::mutex> lock(mutex_);
        *reinterpret_cast<void**>(block) = freeList_;
        freeList_ = block;
    }

private:
    size_t blockSize_;
    size_t poolSize_;
    void* memoryPool_;
    void* freeList_;
    std::mutex mutex_;
};

class GeneralPurposeAllocator {
public:
    GeneralPurposeAllocator() {
        // Define size classes and their respective allocators
        sizeClasses_[16] = std::make_unique<SizeClassAllocator>(16, 1024);
        sizeClasses_[32] = std::make_unique<SizeClassAllocator>(32, 512);
        sizeClasses_[64] = std::make_unique<SizeClassAllocator>(64, 256);
        sizeClasses_[128] = std::make_unique<SizeClassAllocator>(128, 128);
        sizeClasses_[256] = std::make_unique<SizeClassAllocator>(256, 64);
    }

    void* Allocate(size_t size) {
        // Find the smallest size class that can accommodate the requested size
        for (auto& [blockSize, allocator] : sizeClasses_) {
            if (size <= blockSize) {
                void* block = allocator->Allocate();
                if (block) return block;
            }
        }

        // If no suitable size class is found, fall back to default new operator
        return ::operator new(size);
    }

    void Deallocate(void* ptr, size_t size) {
        // Find the appropriate size class and deallocate the memory
        for (auto& [blockSize, allocator] : sizeClasses_) {
            if (size <= blockSize) {
                allocator->Deallocate(ptr);
                return;
            }
        }

        // If the memory was allocated using default new operator, use default delete operator
        ::operator delete(ptr);
    }

private:
    std::map<size_t, std::unique_ptr<SizeClassAllocator>> sizeClasses_;
};

template <typename T>
class GenericAllocator {
public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;
    using propagate_on_container_copy_assignment = std::true_type;
    using propagate_on_container_move_assignment = std::true_type;
    using propagate_on_container_swap = std::true_type;
    using is_always_equal = std::true_type;

    GenericAllocator() noexcept {}

    template <typename U>
    GenericAllocator(const GenericAllocator<U>&) noexcept {}

    pointer allocate(size_type n) {
        if (n == 0) return nullptr;
        void* p = allocator_.Allocate(n * sizeof(T));
        if (!p) throw std::bad_alloc();
        return static_cast<pointer>(p);
    }

    void deallocate(pointer p, size_type n) {
        if (n == 0) return;
        allocator_.Deallocate(p, n * sizeof(T));
    }

    template <typename U, typename... Args>
    void construct(U* p, Args&&... args) {
        new (p) U(std::forward<Args>(args)...);
    }

    template <typename U>
    void destroy(U* p) {
        p->~U();
    }

private:
    static GeneralPurposeAllocator allocator_;
};

template <typename T>
GeneralPurposeAllocator GenericAllocator<T>::allocator_{};

template <typename T, typename U>
bool operator==(const GenericAllocator<T>&, const GenericAllocator<U>&) noexcept {
    return true;
}

template <typename T, typename U>
bool operator!=(const GenericAllocator<T>&, const GenericAllocator<U>&) noexcept {
    return false;
}

int main() {
    std::vector<int, GenericAllocator<int>> myVector(10);
    for (int i = 0; i < 10; ++i) {
        myVector[i] = i;
    }
    for (int i = 0; i < 10; ++i) {
        std::cout << myVector[i] << " ";
    }
    std::cout << std::endl;

    std::vector<std::string, GenericAllocator<std::string>> stringVector;
    stringVector.emplace_back("Hello");
    stringVector.emplace_back("World");

    for (const auto& str : stringVector) {
        std::cout << str << " ";
    }
    std::cout << std::endl;
    return 0;
}

代码解释

SizeClassAllocator: 管理固定大小的内存块。
GeneralPurposeAllocator: 根据请求的内存大小，选择合适的SizeClassAllocator进行分配。如果请求的大小超过了预定义的size classes，则使用默认的new和delete。
GenericAllocator: 一个通用的Allocator，使用GeneralPurposeAllocator进行内存分配。

总结

自定义std::allocator是一个强大的工具，可以让你更好地控制程序的内存管理，提高性能和资源利用率。通过结合内存池、对象池等技术，你可以构建出满足特定需求的Allocator。需要注意的是，自定义Allocator需要仔细设计和测试，以确保其正确性和性能。

希望今天的分享对你有所帮助！记住，没有银弹，只有最适合你的解决方案。根据你的具体应用场景，选择合适的Allocator实现策略。

发表回复 取消回复

发表回复取消回复