C++ 自定义 `std::allocator`：容器内存分配的细粒度控制 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

好的，各位观众老爷们，欢迎来到今天的C++内存分配脱口秀！今天咱们要聊的是一个听起来高端大气上档次，但实际上…嗯…也确实有点高端的东西：自定义std::allocator。

开场白：内存，谁说了算？

咱们写C++，容器是家常便饭。std::vector、std::list、std::map…哪个不是天天见？但你有没有想过，这些容器背后的内存，是谁在默默奉献？没错，就是std::allocator！

默认情况下，容器们会使用std::allocator<T>，这个老兄会调用::operator new和::operator delete来分配和释放内存。换句话说，它基本上就是个封装了全局new和delete的壳子。

但问题来了，全局new和delete虽然好用，但有时候不够灵活。比如：

性能问题： 全局new和delete可能会有锁竞争，在大并发场景下会成为瓶颈。
内存碎片： 频繁分配和释放小块内存会导致内存碎片，降低内存利用率。
定制需求： 你可能想使用特定的内存池，或者在特定的地址分配内存。
诊断与调试： 你可能想追踪内存分配情况，检测内存泄漏。

这时候，自定义std::allocator就闪亮登场了！它可以让你对容器的内存分配进行细粒度控制，就像给容器配了个私人管家，想怎么花钱（内存）都由你说了算。

正文：手把手教你打造私人管家

要自定义std::allocator，你需要定义一个类，并满足一些特定的要求。别怕，其实没那么复杂。

1. allocator类的基本结构

一个最基本的allocator类看起来像这样：

template <typename T>
class MyAllocator {
public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;

    // 默认构造函数、复制构造函数、移动构造函数、析构函数（通常不需要自定义）
    MyAllocator() noexcept = default;
    template <typename U> MyAllocator(const MyAllocator<U>&) noexcept {}
    ~MyAllocator() noexcept = default;

    // 分配内存
    pointer allocate(size_type n);

    // 释放内存
    void deallocate(pointer p, size_type n);
};

value_type: 分配器管理的类型。
pointer、const_pointer、reference、const_reference: 这些类型定义了指针和引用的类型，通常直接使用T*、const T*、T&、const T&就足够了。
size_type、difference_type: 大小和差值的类型，通常使用std::size_t和std::ptrdiff_t。
构造函数、析构函数: 默认的就挺好，一般不用动。
allocate(size_type n): 分配n个T类型的对象的内存。这是最核心的函数之一。
deallocate(pointer p, size_type n): 释放p指向的n个T类型的对象的内存。这也是最核心的函数之一。

2. 实现allocate和deallocate

接下来，我们来实现allocate和deallocate函数。这里我们先用最简单的new和delete来演示：

template <typename T>
typename MyAllocator<T>::pointer MyAllocator<T>::allocate(size_type n) {
    if (n > std::numeric_limits<size_type>::max() / sizeof(T)) {
        throw std::bad_alloc(); // 防止整数溢出
    }
    pointer p = static_cast<pointer>(::operator new(n * sizeof(T)));
    if (!p) {
        throw std::bad_alloc(); // 分配失败
    }
    return p;
}

template <typename T>
void MyAllocator<T>::deallocate(pointer p, size_type n) {
    ::operator delete(p);
}

整数溢出检查: allocate函数首先要检查n * sizeof(T)是否会溢出，避免分配过小的内存。
::operator new: 使用全局new来分配内存。注意，这里使用::operator new而不是new T[n]，因为new T[n]会调用构造函数，而allocator只负责分配原始内存。
::operator delete: 使用全局delete来释放内存。同样，这里使用::operator delete而不是delete[] p，因为delete[] p会调用析构函数。
异常处理: 如果分配失败，抛出std::bad_alloc异常。

3. rebind（重要！）

allocator还需要一个rebind特性。这个特性允许你从一个allocator<T>创建出allocator<U>，这在某些容器（比如std::map）中是必需的。

template <typename T>
struct MyAllocator {
    // ... (前面的代码)

    template <typename U>
    struct rebind {
        using other = MyAllocator<U>;
    };
};

rebind是一个嵌套的模板类，它定义了一个名为other的类型，该类型是MyAllocator<U>。

4. operator==和operator!=

allocator还需要定义operator==和operator!=，用于比较两个allocator是否相等。通常情况下，只要两个allocator的类型相同，就认为它们相等。

template <typename T, typename U>
bool operator==(const MyAllocator<T>&, const MyAllocator<U>&) noexcept {
    return true;
}

template <typename T, typename U>
bool operator!=(const MyAllocator<T>&, const MyAllocator<U>&) noexcept {
    return false;
}

5. 完整代码

把上面的代码片段拼起来，一个最基本的自定义allocator就完成了：

#include <iostream>
#include <vector>
#include <limits>

template <typename T>
class MyAllocator {
public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;

    MyAllocator() noexcept = default;
    template <typename U> MyAllocator(const MyAllocator<U>&) noexcept {}
    ~MyAllocator() noexcept = default;

    pointer allocate(size_type n);
    void deallocate(pointer p, size_type n);

    template <typename U>
    struct rebind {
        using other = MyAllocator<U>;
    };
};

template <typename T>
typename MyAllocator<T>::pointer MyAllocator<T>::allocate(size_type n) {
    if (n > std::numeric_limits<size_type>::max() / sizeof(T)) {
        throw std::bad_alloc();
    }
    pointer p = static_cast<pointer>(::operator new(n * sizeof(T)));
    if (!p) {
        throw std::bad_alloc();
    }
    std::cout << "Allocated " << n * sizeof(T) << " bytes at " << p << std::endl;
    return p;
}

template <typename T>
void MyAllocator<T>::deallocate(pointer p, size_type n) {
    std::cout << "Deallocated memory at " << p << std::endl;
    ::operator delete(p);
}

template <typename T, typename U>
bool operator==(const MyAllocator<T>&, const MyAllocator<U>&) noexcept {
    return true;
}

template <typename T, typename U>
bool operator!=(const MyAllocator<T>&, const MyAllocator<U>&) noexcept {
    return false;
}

int main() {
    std::vector<int, MyAllocator<int>> vec;
    vec.reserve(10); // 预分配10个int的内存
    for (int i = 0; i < 10; ++i) {
        vec.push_back(i);
    }
    return 0;
}

运行上面的代码，你会看到allocate和deallocate函数被调用，并且输出了分配和释放的地址。

进阶：定制你的私人管家

上面的MyAllocator只是个简单的例子，它实际上和std::allocator没什么区别。下面我们来玩点更刺激的，定制你的私人管家！

1. 内存池分配器

内存池是一种预先分配一大块内存，然后从中分配小块内存的技术。它可以减少内存碎片，提高分配速度。

#include <iostream>
#include <vector>
#include <limits>
#include <memory> // std::align

template <typename T>
class PoolAllocator {
private:
    T* pool_ = nullptr;
    size_t pool_size_ = 0;
    T* current_ = nullptr;

public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;

    PoolAllocator(size_t pool_size) : pool_size_(pool_size) {
        pool_ = static_cast<T*>(::operator new(pool_size_ * sizeof(T)));
        current_ = pool_;
        std::cout << "PoolAllocator: Initialized pool of " << pool_size_ * sizeof(T) << " bytes at " << pool_ << std::endl;
    }

    template <typename U>
    PoolAllocator(const PoolAllocator<U>& other) : pool_size_(other.pool_size_) {
        pool_ = static_cast<T*>(::operator new(pool_size_ * sizeof(T)));
        current_ = pool_;
        std::cout << "PoolAllocator: Copy Initialized pool of " << pool_size_ * sizeof(T) << " bytes at " << pool_ << std::endl;
    }

    ~PoolAllocator() {
        std::cout << "PoolAllocator: Destroying pool at " << pool_ << std::endl;
        ::operator delete(pool_);
    }

    pointer allocate(size_type n) {
        if (n > 1) {
            throw std::bad_alloc(); // 只能分配单个对象
        }

        if (current_ + n > pool_ + pool_size_) {
            throw std::bad_alloc(); // 内存池已满
        }

        pointer p = current_;
        current_ += n;
        std::cout << "PoolAllocator: Allocated " << sizeof(T) << " bytes at " << p << std::endl;
        return p;
    }

    void deallocate(pointer p, size_type n) {
        // 不做实际释放，等待整个内存池销毁
        std::cout << "PoolAllocator: Deallocate called (no actual deallocation)" << std::endl;
    }

    template <typename U>
    struct rebind {
        using other = PoolAllocator<U>;
    };
};

template <typename T, typename U>
bool operator==(const PoolAllocator<T>&, const PoolAllocator<U>&) noexcept {
    return true;
}

template <typename T, typename U>
bool operator!=(const PoolAllocator<T>&, const PoolAllocator<U>&) noexcept {
    return false;
}

int main() {
    PoolAllocator<int> allocator(10); // 创建一个大小为10的int内存池
    std::vector<int, PoolAllocator<int>> vec(allocator);

    for (int i = 0; i < 5; ++i) {
        vec.push_back(i);
    }

    return 0;
}

pool_、pool_size_、current_: pool_指向内存池的起始地址，pool_size_是内存池的大小，current_指向下一个可分配的地址。
构造函数: 在构造函数中分配内存池，并初始化current_。
allocate: 从current_指向的位置分配内存，并将current_向后移动。如果内存池已满，抛出std::bad_alloc异常。
deallocate: 不进行实际的内存释放。因为内存池通常在整个容器销毁时才释放。
注意: 这个简单的内存池分配器只能分配单个对象，如果需要分配多个对象，需要进行修改。

2. 追踪内存分配的分配器

如果你想追踪内存分配情况，可以创建一个追踪内存分配的分配器。

#include <iostream>
#include <vector>
#include <limits>
#include <map>
#include <mutex>

template <typename T>
class TrackingAllocator {
private:
    std::map<void*, size_t> allocations_;
    std::mutex mutex_;

public:
    using value_type = T;
    using pointer = T*;
    using const_pointer = const T*;
    using reference = T&;
    using const_reference = const T&;
    using size_type = std::size_t;
    using difference_type = std::ptrdiff_t;

    TrackingAllocator() noexcept = default;
    template <typename U> TrackingAllocator(const TrackingAllocator<U>&) noexcept {}
    ~TrackingAllocator() {
        std::lock_guard<std::mutex> lock(mutex_);
        if (!allocations_.empty()) {
            std::cerr << "Memory leak detected!" << std::endl;
            for (const auto& [ptr, size] : allocations_) {
                std::cerr << "  Address: " << ptr << ", Size: " << size << " bytes" << std::endl;
            }
        }
    }

    pointer allocate(size_type n) {
        if (n > std::numeric_limits<size_type>::max() / sizeof(T)) {
            throw std::bad_alloc();
        }
        pointer p = static_cast<pointer>(::operator new(n * sizeof(T)));
        if (!p) {
            throw std::bad_alloc();
        }

        std::lock_guard<std::mutex> lock(mutex_);
        allocations_[p] = n * sizeof(T);

        std::cout << "TrackingAllocator: Allocated " << n * sizeof(T) << " bytes at " << p << std::endl;
        return p;
    }

    void deallocate(pointer p, size_type n) {
        std::lock_guard<std::mutex> lock(mutex_);
        auto it = allocations_.find(p);
        if (it != allocations_.end()) {
            allocations_.erase(it);
        } else {
            std::cerr << "Warning: Attempting to deallocate untracked memory at " << p << std::endl;
        }

        std::cout << "TrackingAllocator: Deallocated memory at " << p << std::endl;
        ::operator delete(p);
    }

    template <typename U>
    struct rebind {
        using other = TrackingAllocator<U>;
    };
};

template <typename T, typename U>
bool operator==(const TrackingAllocator<T>&, const TrackingAllocator<U>&) noexcept {
    return true;
}

template <typename T, typename U>
bool operator!=(const TrackingAllocator<T>&, const TrackingAllocator<U>&) noexcept {
    return false;
}

int main() {
    TrackingAllocator<int> allocator;
    {
        std::vector<int, TrackingAllocator<int>> vec(allocator);
        for (int i = 0; i < 5; ++i) {
            vec.push_back(i);
        }
    } // vec goes out of scope, memory is deallocated

    // If there was a memory leak, the destructor of TrackingAllocator will report it.

    return 0;
}

allocations_: 一个std::map，用于存储已分配的内存地址和大小。
mutex_: 一个互斥锁，用于保护allocations_的线程安全。
allocate: 在分配内存后，将地址和大小添加到allocations_中。
deallocate: 在释放内存后，从allocations_中移除地址。
析构函数: 在析构函数中检查allocations_是否为空。如果不为空，说明存在内存泄漏。

总结：内存分配，尽在掌握

自定义std::allocator是一个强大的工具，它可以让你对容器的内存分配进行细粒度控制。虽然实现一个功能完善的allocator需要一定的技巧，但掌握了基本原理后，你就可以根据自己的需求定制各种各样的allocator，从而提高程序的性能、降低内存碎片、方便调试。

希望今天的脱口秀能让你对自定义std::allocator有一个更深入的了解。记住，内存分配，尽在掌握！下次再见！

发表回复 取消回复

发表回复取消回复