如何利用 C++ 实现自定义的 `operator new`：为特定组件构建高性能的片上内存分配器 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

在构建高性能系统，特别是片上系统（System-on-Chip, SoC）或嵌入式系统时，内存管理往往是决定性能的关键因素之一。标准的 C++ operator new 和 operator delete 通常依赖于操作系统的堆管理器（如 malloc/free），这可能引入不可预测的延迟、内存碎片化以及过高的开销，尤其是在内存访问速度至关重要且资源受限的环境中。为了满足特定组件对内存分配的极致性能要求，例如在数字信号处理器（DSP）或硬件加速器中，我们常常需要实现自定义的 operator new。

本讲座将深入探讨如何在 C++ 中为特定组件实现高性能的片上内存分配器，重点在于自定义 operator new。我们将从基础概念开始，逐步构建一个实用的内存池分配器，并探讨其在片上内存环境中的应用、优化与注意事项。

一、 `operator new` 的本质与标准分配器的局限性

1.1 `operator new` 的工作原理

在 C++ 中，new 表达式不仅仅是分配内存。它是一个两阶段过程：

内存分配： 调用 operator new 函数来分配足够的原始内存。这个函数返回一个 void* 指针，指向一块未初始化的内存。
对象构造： 在分配的内存上调用对象的构造函数，将原始内存转换为一个完全构造的对象。

类似地，delete 表达式也是一个两阶段过程：

对象析构： 调用对象的析构函数。
内存释放： 调用 operator delete 函数来释放内存。

operator new 和 operator delete 是 C++ 标准库提供的全局函数，它们通常封装了底层操作系统或运行时库的内存分配接口，例如 POSIX 系统上的 malloc 和 free。

1.2 标准分配器的局限性

尽管 malloc/free 或默认的 operator new/delete 在通用编程中表现良好，但在以下高性能或资源受限场景中，它们可能成为瓶颈：

性能开销： 默认分配器通常是通用型的，需要处理任意大小的分配请求，并且常常涉及锁机制（在多线程环境中），这会导致额外的 CPU 周期和延迟。在片上系统中，这种开销是无法接受的。
内存碎片化： 频繁的小块内存分配和释放可能导致堆内存中出现大量不连续的小空闲块，即使总空闲内存充足，也无法满足大的连续分配请求。这在需要连续内存区域的场景（如图像缓冲区、信号处理）中是致命的。
不可预测性： 默认分配器的分配时间可能因内部算法和当前堆状态而异，导致分配延迟的抖动。对于实时系统或需要严格时序的硬件操作，这种不可预测性是不可接受的。
内存位置控制： 默认分配器无法保证内存分配在特定的物理地址区域。而片上系统通常有不同类型的内存（如高速 SRAM、DDR 内存），且特定组件可能需要将数据放置在最近或最快的内存中。
缺乏诊断能力： 默认分配器通常提供有限的诊断工具，难以追踪内存泄漏或过度使用。

1.3 为什么需要自定义 `operator new`？

自定义 operator new 允许我们完全控制内存分配过程。通过这种方式，我们可以：

优化性能： 根据特定组件的内存访问模式和对象大小，设计更高效的分配算法。
控制内存布局： 将对象分配到指定的片上内存区域。
避免碎片化： 使用内存池等策略，显著降低碎片化风险。
提高可预测性： 确保分配和释放操作在恒定或可预测的时间内完成。
增强诊断： 集成自定义的内存调试和统计功能。

二、片上内存的特性与分配器设计原则

2.1 片上内存（On-Chip Memory）的特性

片上内存通常指的是集成在同一芯片上的存储器，例如：

SRAM (Static RAM)： 速度极快，通常用于 CPU 缓存、寄存器文件、或作为高性能组件的紧耦合内存 (Tightly Coupled Memory, TCM)。容量相对较小，成本较高。
eDRAM (embedded DRAM)： 比 SRAM 容量大，但速度稍慢。
ROM/Flash： 用于存储固件和常量数据。

对于高性能组件，我们主要关注 SRAM 或 TCM。它们的关键特性包括：

固定且预先可知的大小和地址范围： 这使得我们可以预分配一大块内存作为分配器的基础。
极低的访问延迟： 通常与 CPU 核心以全速运行，没有外部总线延迟。
直接物理地址访问： 没有虚拟内存映射，也没有操作系统的页面管理开销。
资源有限： 容量通常不大，需要精打细算。

2.2 片上内存分配器的设计原则

基于片上内存的特性和对性能的要求，我们的自定义分配器应遵循以下原则：

预分配（Pre-allocation）： 在系统启动时或组件初始化时，从已知的片上内存区域预留一大块连续内存。后续的分配请求将从这块预留内存中进行管理，而不是频繁向操作系统请求小块内存。
简单快速的算法： 避免复杂的搜索、合并算法。通常，固定大小块分配器（内存池）或基于自由列表的分配器是首选。
最小化元数据： 每个分配块的额外信息（如块大小、指针等）应尽可能少，以减少内存开销。
对齐（Alignment）： 确保分配的内存块满足目标硬件和数据类型的对齐要求，以避免性能下降或硬件异常。
线程安全（可选但推荐）： 如果组件在多线程环境下运行，分配器需要是线程安全的。
错误处理： 当内存耗尽时，应有明确的错误处理机制。
可调试性： 可以在开发阶段集成一些调试辅助功能，如内存使用统计、泄漏检测等。

三、 C++ `operator new` 的重载机制

在深入实现自定义分配器之前，我们首先需要理解如何在 C++ 中重载 operator new。

3.1 全局 `operator new` 和 `operator delete`

你可以重载全局的 operator new 和 operator delete 来影响程序中所有通过 new 和 delete 进行的动态内存分配。
它们的签名如下：

// 全局 operator new
void* operator new(std::size_t size);
void* operator new(std::size_t size, const std::nothrow_t&) noexcept;

// 全局 operator delete
void operator delete(void* ptr) noexcept;
void operator delete(void* ptr, std::size_t size) noexcept; // C++14 onwards
void operator delete(void* ptr, const std::nothrow_t&) noexcept;
void operator delete(void* ptr, std::size_t size, const std::nothrow_t&) noexcept; // C++14 onwards

重载全局 operator new 会影响整个程序的内存分配行为。这通常用于嵌入式系统，其中 malloc/free 可能不可用或性能不佳。

示例：

#include <iostream>
#include <cstdlib> // For malloc/free if simulating, or custom implementation
#include <new>     // For std::nothrow_t

// 简单的全局内存池模拟
static char global_memory_pool_buffer[1024 * 1024]; // 1MB
static std::size_t global_memory_pool_offset = 0;

void* operator new(std::size_t size) {
    std::cout << "Global operator new called for size: " << size << std::endl;
    if (global_memory_pool_offset + size > sizeof(global_memory_pool_buffer)) {
        throw std::bad_alloc();
    }
    void* ptr = &global_memory_pool_buffer[global_memory_pool_offset];
    global_memory_pool_offset += size;
    return ptr;
}

void operator delete(void* ptr) noexcept {
    std::cout << "Global operator delete called" << std::endl;
    // 对于这个简单的bump allocator，delete不做任何事。
    // 实际的内存池需要管理空闲块。
}

void* operator new(std::size_t size, const std::nothrow_t&) noexcept {
    try {
        return operator new(size);
    } catch (const std::bad_alloc&) {
        return nullptr;
    }
}

// C++14 delete with size
void operator delete(void* ptr, std::size_t size) noexcept {
    std::cout << "Global operator delete (with size) called for size: " << size << std::endl;
    // 同上，对于这个简单分配器，delete不做任何事。
}

class MyGlobalObject {
public:
    int data[100];
    MyGlobalObject() { std::cout << "MyGlobalObject constructed" << std::endl; }
    ~MyGlobalObject() { std::cout << "MyGlobalObject destructed" << std::endl; }
};

int main() {
    MyGlobalObject* obj1 = new MyGlobalObject();
    MyGlobalObject* obj2 = new (std::nothrow) MyGlobalObject(); // Using nothrow new
    delete obj1;
    delete obj2;

    // 分配一个更大的对象，可能会触发bad_alloc
    // try {
    //     int* large_array = new int[500 * 1024]; // 2MB, will exceed 1MB pool
    //     delete[] large_array;
    // } catch (const std::bad_alloc& e) {
    //     std::cerr << "Allocation failed: " << e.what() << std::endl;
    // }

    return 0;
}

注意： 全局重载需要极其谨慎，因为它会影响整个程序，可能与第三方库或标准库的内部实现冲突。在片上内存分配场景中，我们通常更倾向于类专有的 operator new。

3.2 类专有 `operator new` 和 `operator delete`

你可以为特定的类重载 operator new 和 operator delete。这只会影响该类的对象以及其派生类的对象（除非派生类也重载了）。这是实现高性能组件专用内存分配器的首选方法。
它们的签名与全局版本类似，但它们是静态成员函数：

class MyComponent {
public:
    // 普通形式
    static void* operator new(std::size_t size);
    static void operator delete(void* ptr) noexcept;

    // nothrow 形式
    static void* operator new(std::size_t size, const std::nothrow_t&) noexcept;
    static void operator delete(void* ptr, const std::nothrow_t&) noexcept;

    // C++14 带 size 的 delete
    static void operator delete(void* ptr, std::size_t size) noexcept;
    static void operator delete(void* ptr, std::size_t size, const std::nothrow_t&) noexcept;

    // Placement new (自定义额外参数)
    static void* operator new(std::size_t size, void* location); // 这是内置的placement new，不需要重载
    static void* operator new(std::size_t size, CustomAllocator* alloc); // 自定义placement new
};

当 new MyComponent 被调用时，编译器会首先查找 MyComponent::operator new。如果找不到，它会查找基类的 operator new，最后才查找全局的 operator new。

优势：

精细控制： 只影响特定类的实例。
最佳匹配： 可以根据类的特性（如固定大小）优化分配器。
隔离性： 不会影响程序其他部分的内存管理。

四、构建高性能内存池分配器 (Memory Pool)

对于片上内存环境中的特定组件，对象通常具有固定或有限的几种大小。在这种情况下，固定大小块分配器（也称为内存池或 Slab Allocator）是最理想的选择。

4.1 内存池的核心思想

内存池的工作原理如下：

初始化： 从一个大的预分配内存区域中，将这块内存分割成许多固定大小的小块（称为“块”或“槽”）。
空闲列表： 维护一个“空闲列表”（Free List），其中包含所有当前可用的内存块。
分配： 当需要分配内存时，从空闲列表中取出一个块并返回给调用者。
释放： 当内存被释放时，将该块返回到空闲列表中。

这种方法具有以下优点：

极速分配/释放： 只需要简单地操作链表头部的指针。
无碎片化（对于固定大小对象）： 所有块大小相同，不会产生内部碎片。
缓存友好： 连续的块分配可能带来更好的缓存局部性。

4.2 内存池的实现细节

我们将构建一个通用的 MemoryPool 类，它可以管理固定大小的内存块。

数据结构：

内存缓冲区： 一大块 char 数组或 std::byte 数组，作为所有小块的存储空间。
空闲列表： 我们不需要独立的链表节点对象。每个空闲的内存块本身就可以被用作链表中的一个节点。这意味着空闲块的第一个字节将存储指向下一个空闲块的指针。

关键考量：

块大小： 必须足够大以容纳任何对象，并且还要考虑对齐要求。
对齐： 分配的内存块必须满足最大对齐要求，通常是 alignof(std::max_align_t) 或特定硬件的对齐要求。
线程安全： 如果多线程访问，需要互斥锁。

代码实现：MemoryPool 类

#include <cstddef> // For std::size_t, std::byte
#include <new>     // For std::bad_alloc
#include <atomic>  // For std::atomic
#include <mutex>   // For std::mutex, std::lock_guard
#include <iostream> // For debug output

// 定义一个链表节点结构，用于空闲列表。
// 实际上，我们不需要额外分配这些节点，每个空闲的内存块本身就可以被视为一个FreeBlock。
struct FreeBlock {
    FreeBlock* next;
};

class MemoryPool {
public:
    // 构造函数：初始化内存池
    // buffer: 预分配的原始内存区域的指针
    // buffer_size: 原始内存区域的总大小
    // block_size: 内存池管理的每个块的大小 (字节)
    // alignment: 内存块的对齐要求 (字节)
    MemoryPool(void* buffer, std::size_t buffer_size, std::size_t block_size, std::size_t alignment)
        : m_buffer_start(static_cast<std::byte*>(buffer)),
          m_buffer_size(buffer_size),
          m_block_size(block_size),
          m_alignment(alignment),
          m_free_list_head(nullptr),
          m_allocated_blocks(0)
    {
        if (buffer == nullptr || buffer_size == 0 || block_size == 0 || alignment == 0) {
            // 可以在这里抛出异常或进行断言
            std::cerr << "MemoryPool: Invalid initialization parameters." << std::endl;
            return;
        }

        // 确保块大小至少能容纳一个FreeBlock指针
        if (m_block_size < sizeof(FreeBlock*)) {
            m_block_size = sizeof(FreeBlock*);
        }

        // 调整块大小以满足对齐要求，并保证能容纳FreeBlock*
        // 如果块大小不是对齐要求的倍数，则向上取整
        m_block_size = (m_block_size + m_alignment - 1) / m_alignment * m_alignment;

        // 计算第一个对齐的内存地址
        std::byte* aligned_start = align_pointer(m_buffer_start, m_alignment);

        // 计算可用内存的大小
        std::size_t usable_buffer_size = m_buffer_size - (aligned_start - m_buffer_start);
        if (usable_buffer_size < m_block_size) {
            std::cerr << "MemoryPool: Buffer too small to allocate even one block after alignment." << std::endl;
            return;
        }

        // 初始化空闲列表
        std::size_t num_blocks = usable_buffer_size / m_block_size;
        for (std::size_t i = 0; i < num_blocks; ++i) {
            FreeBlock* block = reinterpret_cast<FreeBlock*>(aligned_start + i * m_block_size);
            block->next = m_free_list_head; // 将当前块添加到链表头部
            m_free_list_head = block;
        }
        std::cout << "MemoryPool initialized: " << num_blocks << " blocks of " << m_block_size << " bytes each." << std::endl;
    }

    // 分配内存
    void* allocate() {
        std::lock_guard<std::mutex> lock(m_mutex); // 保证线程安全
        if (m_free_list_head == nullptr) {
            // 内存池已满
            std::cerr << "MemoryPool: Allocation failed, pool is exhausted." << std::endl;
            throw std::bad_alloc();
        }

        void* block_ptr = m_free_list_head;
        m_free_list_head = m_free_list_head->next; // 移动头指针
        m_allocated_blocks++;
        // std::cout << "MemoryPool: Allocated block at " << block_ptr << ". Total allocated: " << m_allocated_blocks << std::endl;
        return block_ptr;
    }

    // 释放内存
    void deallocate(void* ptr) noexcept {
        if (ptr == nullptr) {
            return;
        }

        std::lock_guard<std::mutex> lock(m_mutex); // 保证线程安全

        // 简单检查：确保指针在内存池范围内
        if (static_cast<std::byte*>(ptr) < m_buffer_start ||
            static_cast<std::byte*>(ptr) >= m_buffer_start + m_buffer_size)
        {
            std::cerr << "MemoryPool: Deallocation of out-of-bounds pointer detected: " << ptr << std::endl;
            // 严重错误，可能需要更严格的错误处理
            return;
        }

        FreeBlock* block = reinterpret_cast<FreeBlock*>(ptr);
        block->next = m_free_list_head; // 将块添加回空闲列表头部
        m_free_list_head = block;
        m_allocated_blocks--;
        // std::cout << "MemoryPool: Deallocated block at " << ptr << ". Total allocated: " << m_allocated_blocks << std::endl;
    }

    // 获取当前已分配的块数
    std::size_t getAllocatedBlocksCount() const {
        return m_allocated_blocks.load();
    }

    // 获取总块数 (近似值，基于初始计算)
    std::size_t getTotalBlocksCount() const {
        // 重新计算一次，确保准确性
        std::byte* aligned_start = align_pointer(m_buffer_start, m_alignment);
        std::size_t usable_buffer_size = m_buffer_size - (aligned_start - m_buffer_start);
        if (usable_buffer_size < m_block_size) return 0;
        return usable_buffer_size / m_block_size;
    }

    // 获取每个块的大小
    std::size_t getBlockSize() const {
        return m_block_size;
    }

private:
    std::byte* m_buffer_start;       // 内存池缓冲区的起始地址
    std::size_t m_buffer_size;       // 内存池缓冲区的总大小
    std::size_t m_block_size;        // 每个内存块的大小 (可能已经调整过对齐)
    std::size_t m_alignment;         // 对齐要求

    FreeBlock* m_free_list_head;     // 空闲列表的头指针
    std::mutex m_mutex;              // 线程安全锁
    std::atomic<std::size_t> m_allocated_blocks; // 统计已分配的块数

    // 辅助函数：将指针按指定对齐方式对齐
    std::byte* align_pointer(std::byte* ptr, std::size_t alignment) {
        std::uintptr_t int_ptr = reinterpret_cast<std::uintptr_t>(ptr);
        std::uintptr_t aligned_int_ptr = (int_ptr + alignment - 1) & ~(alignment - 1);
        return reinterpret_cast<std::byte*>(aligned_int_ptr);
    }
};

对齐 (Alignment) 的重要性：
在片上系统中，内存对齐至关重要。许多硬件（如 DSPs、DMA 控制器）在访问未对齐的数据时会产生性能下降甚至硬件异常。align_pointer 函数确保内存池的起始地址以及每个块的地址都满足指定的对齐要求。m_block_size 也被调整为对齐要求的倍数，以保证每个块的起始地址在分配后仍然是对齐的。

线程安全：
std::mutex 和 std::lock_guard 用于保护 m_free_list_head 和 m_allocated_blocks 在多线程访问时的并发安全性。

五、将内存池与类专有 `operator new` 集成

现在我们有了 MemoryPool，接下来就是如何将它用于特定组件。

5.1 定义组件类并集成 `operator new`

假设我们有一个名为 FilterCoefficient 的类，它代表 DSP 中的一个滤波器系数，需要快速分配在片上内存中。

#include <cstddef> // For std::size_t
#include <new>     // For std::bad_alloc
#include <iostream>

// 假设我们的MemoryPool类已经定义在上面
// ... (MemoryPool definition goes here) ...

// 定义一个静态缓冲区用于FilterCoefficient的内存池
// 在实际片上系统中，这会是一个指向特定SRAM区域的指针
static std::byte filter_coeff_sram_buffer[1024 * 16]; // 16KB for FilterCoefficients

// 声明FilterCoefficient的内存池 (在main函数中初始化)
static MemoryPool* g_filter_coeff_pool = nullptr;

class FilterCoefficient {
public:
    // 假设FilterCoefficient的内部数据
    float value;
    int index;
    // 其他一些数据...
    char padding[16 - (sizeof(float) + sizeof(int)) % 16]; // 确保至少16字节大小

    FilterCoefficient(float val, int idx) : value(val), index(idx) {
        // std::cout << "FilterCoefficient constructed: " << value << ", " << index << std::endl;
    }

    ~FilterCoefficient() {
        // std::cout << "FilterCoefficient destructed: " << value << ", " << index << std::endl;
    }

    // 重载类专有的 operator new
    static void* operator new(std::size_t size) {
        if (size != sizeof(FilterCoefficient)) {
            // 如果请求的大小与类实际大小不符，通常意味着new[]被调用，
            // 或者有继承关系，这超出了本内存池的设计范围。
            // 此时可以回退到全局new，或者抛出bad_alloc。
            // 对于固定大小内存池，我们只支持精确匹配的类大小。
            std::cerr << "FilterCoefficient::operator new: Size mismatch. Requested " << size
                      << ", expected " << sizeof(FilterCoefficient) << ". Falling back to global new." << std::endl;
            return ::operator new(size); // 回退到全局new
        }
        if (g_filter_coeff_pool == nullptr) {
            std::cerr << "FilterCoefficient::operator new: Memory pool not initialized!" << std::endl;
            throw std::bad_alloc();
        }
        return g_filter_coeff_pool->allocate();
    }

    // 重载类专有的 operator delete
    static void operator delete(void* ptr) noexcept {
        if (g_filter_coeff_pool == nullptr) {
            std::cerr << "FilterCoefficient::operator delete: Memory pool not initialized!" << std::endl;
            ::operator delete(ptr); // 回退到全局delete
            return;
        }
        g_filter_coeff_pool->deallocate(ptr);
    }

    // C++14 onwards: 带 size 参数的 delete
    static void operator delete(void* ptr, std::size_t size) noexcept {
        if (size != sizeof(FilterCoefficient)) {
            std::cerr << "FilterCoefficient::operator delete: Size mismatch. Falling back to global delete." << std::endl;
            ::operator delete(ptr, size); // 回退到全局delete
            return;
        }
        if (g_filter_coeff_pool == nullptr) {
            std::cerr << "FilterCoefficient::operator delete: Memory pool not initialized!" << std::endl;
            ::operator delete(ptr, size); // 回退到全局delete
            return;
        }
        g_filter_coeff_pool->deallocate(ptr);
    }

    // 如果需要 nothrow new
    static void* operator new(std::size_t size, const std::nothrow_t&) noexcept {
        try {
            return operator new(size);
        } catch (const std::bad_alloc&) {
            return nullptr;
        }
    }

    static void operator delete(void* ptr, const std::nothrow_t&) noexcept {
        operator delete(ptr);
    }
};

关键点说明：

operator new 接收 std::size_t size 参数，这是编译器计算出的对象大小。对于固定大小的内存池，我们通常会检查这个 size 是否与 sizeof(FilterCoefficient) 相匹配。如果不匹配，这可能意味着我们尝试分配一个数组（new FilterCoefficient[N]）或者一个派生类对象，而我们的内存池是为单个固定大小对象设计的。在这种情况下，回退到全局 ::operator new 是一个安全的做法。
g_filter_coeff_pool 是一个指向 MemoryPool 实例的全局指针。在 main 函数或系统初始化时，这个内存池需要被正确地实例化和初始化。
operator delete 也需要类似的检查和回退逻辑。

5.2 模拟片上内存区域

在实际的片上系统中，filter_coeff_sram_buffer 可能是一个 volatile 限定的指针，指向一块物理地址已知的 SRAM 区域：

// 实际片上系统中的声明示例
// extern volatile std::byte ON_CHIP_SRAM_START[SRAM_SIZE]; // 假设有一个外部声明
// static std::byte* filter_coeff_sram_buffer = ON_CHIP_SRAM_START + SOME_OFFSET;
// static const std::size_t filter_coeff_sram_buffer_size = 1024 * 16;

为了在通用环境中演示，我们使用一个静态的 std::byte 数组来模拟这块内存。

5.3 实例化与测试

// 在main函数中进行初始化和测试
int main() {
    // 初始化FilterCoefficient的内存池
    // 块大小应至少是FilterCoefficient的大小，并满足对齐要求
    const std::size_t filter_coeff_object_size = sizeof(FilterCoefficient);
    const std::size_t filter_coeff_alignment = alignof(FilterCoefficient); // 或更大的硬件对齐要求
    const std::size_t effective_block_size = (filter_coeff_object_size + filter_coeff_alignment - 1) / filter_coeff_alignment * filter_coeff_alignment;

    std::cout << "FilterCoefficient object size: " << filter_coeff_object_size << " bytes" << std::endl;
    std::cout << "FilterCoefficient alignment: " << filter_coeff_alignment << " bytes" << std::endl;
    std::cout << "MemoryPool effective block size: " << effective_block_size << " bytes" << std::endl;

    MemoryPool filter_coeff_pool_instance(
        filter_coeff_sram_buffer,
        sizeof(filter_coeff_sram_buffer),
        filter_coeff_object_size,
        filter_coeff_alignment
    );
    g_filter_coeff_pool = &filter_coeff_pool_instance; // 设置全局指针

    std::cout << "n--- Allocating FilterCoefficient objects ---" << std::endl;
    FilterCoefficient* coeffs[10];
    for (int i = 0; i < 10; ++i) {
        coeffs[i] = new FilterCoefficient(static_cast<float>(i * 0.1), i);
        std::cout << "Allocated coeff[" << i << "] at " << static_cast<void*>(coeffs[i])
                  << ", value=" << coeffs[i]->value << std::endl;
    }

    std::cout << "nMemoryPool status: Allocated " << g_filter_coeff_pool->getAllocatedBlocksCount()
              << " blocks out of " << g_filter_coeff_pool->getTotalBlocksCount() << std::endl;

    std::cout << "n--- Deallocating FilterCoefficient objects ---" << std::endl;
    for (int i = 0; i < 10; ++i) {
        delete coeffs[i];
    }

    std::cout << "nMemoryPool status: Allocated " << g_filter_coeff_pool->getAllocatedBlocksCount()
              << " blocks out of " << g_filter_coeff_pool->getTotalBlocksCount() << std::endl;

    std::cout << "n--- Testing memory exhaustion ---" << std::endl;
    std::vector<FilterCoefficient*> large_coeff_list;
    try {
        while (true) {
            large_coeff_list.push_back(new FilterCoefficient(0.0f, 0));
        }
    } catch (const std::bad_alloc& e) {
        std::cerr << "Caught exception: " << e.what() << ". Pool exhausted as expected." << std::endl;
    }

    std::cout << "Final MemoryPool status: Allocated " << g_filter_coeff_pool->getAllocatedBlocksCount()
              << " blocks out of " << g_filter_coeff_pool->getTotalBlocksCount() << std::endl;

    // 清理所有剩余的分配
    for (FilterCoefficient* p : large_coeff_list) {
        delete p;
    }
    large_coeff_list.clear();

    std::cout << "After cleanup, MemoryPool status: Allocated " << g_filter_coeff_pool->getAllocatedBlocksCount()
              << " blocks out of " << g_filter_coeff_pool->getTotalBlocksCount() << std::endl;

    // 尝试分配一个不同大小的对象，应该回退到全局new
    struct DifferentSizeObject { int data[20]; };
    std::cout << "n--- Testing allocation of a different size object ---" << std::endl;
    DifferentSizeObject* diff_obj = new DifferentSizeObject(); // 应该调用全局new
    delete diff_obj;

    return 0;
}

六、高级特性与进一步优化

6.1 支持多种对象大小的内存池

单个 MemoryPool 只能处理一种固定大小的块。如果一个组件需要分配多种不同大小的对象，我们可以：

多个 MemoryPool 实例： 为每种不同大小的对象维护一个独立的 MemoryPool 实例。这通常是最佳实践。
通用内存分配器： 实现一个更复杂的分配器，如 Buddy System 或 Slab Allocator 的变体，可以处理不同大小的请求。但对于片上内存，通常会限制对象类型和大小，因此多个 MemoryPool 更常见。

示例：为 SignalBuffer 类创建另一个内存池

// 假设有另一个组件类 SignalBuffer
static std::byte signal_buffer_sram_buffer[1024 * 64]; // 64KB for SignalBuffers
static MemoryPool* g_signal_buffer_pool = nullptr;

class SignalBuffer {
public:
    // 模拟一个信号缓冲区
    short data[128]; // 256 bytes
    int id;

    SignalBuffer(int buffer_id) : id(buffer_id) {
        // std::cout << "SignalBuffer constructed: " << id << std::endl;
        for (int i = 0; i < 128; ++i) data[i] = static_cast<short>(i);
    }
    ~SignalBuffer() {
        // std::cout << "SignalBuffer destructed: " << id << std::endl;
    }

    // 重载类专有的 operator new 和 delete
    static void* operator new(std::size_t size) {
        if (size != sizeof(SignalBuffer)) {
            std::cerr << "SignalBuffer::operator new: Size mismatch. Requested " << size
                      << ", expected " << sizeof(SignalBuffer) << ". Falling back to global new." << std::endl;
            return ::operator new(size);
        }
        if (g_signal_buffer_pool == nullptr) {
            std::cerr << "SignalBuffer::operator new: Memory pool not initialized!" << std::endl;
            throw std::bad_alloc();
        }
        return g_signal_buffer_pool->allocate();
    }

    static void operator delete(void* ptr) noexcept {
        if (g_signal_buffer_pool == nullptr) {
            std::cerr << "SignalBuffer::operator delete: Memory pool not initialized!" << std::endl;
            ::operator delete(ptr);
            return;
        }
        g_signal_buffer_pool->deallocate(ptr);
    }

    // ... 其他 operator new/delete 重载类似 FilterCoefficient ...
};

// 在main函数中初始化g_signal_buffer_pool并使用
// ...
// main() {
//     // ... FilterCoefficient pool initialization ...
//
//     // 初始化SignalBuffer的内存池
//     const std::size_t signal_buffer_object_size = sizeof(SignalBuffer);
//     const std::size_t signal_buffer_alignment = alignof(SignalBuffer);
//     MemoryPool signal_buffer_pool_instance(
//         signal_buffer_sram_buffer,
//         sizeof(signal_buffer_sram_buffer),
//         signal_buffer_object_size,
//         signal_buffer_alignment
//     );
//     g_signal_buffer_pool = &signal_buffer_pool_instance;
//
//     std::cout << "n--- Allocating SignalBuffer objects ---" << std::endl;
//     SignalBuffer* buffers[5];
//     for (int i = 0; i < 5; ++i) {
//         buffers[i] = new SignalBuffer(i + 100);
//         std::cout << "Allocated buffer[" << i << "] at " << static_cast<void*>(buffers[i])
//                   << ", id=" << buffers[i]->id << std::endl;
//     }
//     // ... deallocate ...
// }

6.2 Placement New 与自定义 Placement New

标准 Placement New： new (address) MyObject() 允许你在已分配的内存地址上构造对象，而不进行内存分配。我们的 operator new 返回原始内存后，编译器会自动使用 Placement New 调用构造函数。
自定义 Placement New： 你可以重载 operator new 以接受除 size_t 以外的额外参数。这被称为“自定义 Placement New”。这对于将分配委托给特定的内存池非常有用。

class MyComponentWithCustomPlacement {
public:
    int data;
    MyComponentWithCustomPlacement() : data(0) {}

    // 自定义 Placement New，接受一个 MemoryPool 指针作为参数
    static void* operator new(std::size_t size, MemoryPool& pool) {
        std::cout << "Custom Placement new called for MyComponentWithCustomPlacement." << std::endl;
        if (size != sizeof(MyComponentWithCustomPlacement)) {
            // 同样，处理大小不匹配的情况
            return ::operator new(size);
        }
        return pool.allocate();
    }

    // 与自定义 Placement New 对应的 operator delete (如果 new 成功但构造失败，会调用此函数)
    // 签名必须匹配 new 的额外参数
    static void operator delete(void* ptr, MemoryPool& pool) noexcept {
        std::cout << "Custom Placement delete called for MyComponentWithCustomPlacement (constructor failed)." << std::endl;
        pool.deallocate(ptr);
    }

    // 正常的 delete (对象被成功构造后，通过 delete 调用)
    static void operator delete(void* ptr) noexcept {
        std::cout << "Normal delete called for MyComponentWithCustomPlacement." << std::endl;
        // 这里需要知道是哪个池分配的内存，可能需要在ptr前部存储池信息，
        // 或者依赖于全局的g_component_pool指针。
        // 为了简化，这里假设只有一个池或可以根据地址判断。
        if (g_filter_coeff_pool != nullptr) { // 假设共用一个池
            g_filter_coeff_pool->deallocate(ptr);
        } else {
            ::operator delete(ptr);
        }
    }
};

// 使用自定义 Placement New
// MemoryPool my_specific_pool(...);
// MyComponentWithCustomPlacement* obj = new (my_specific_pool) MyComponentWithCustomPlacement();
// delete obj; // 会调用正常的 operator delete

自定义 Placement New 的好处是，你可以在 new 表达式中明确指定使用哪个内存池，而不是依赖于全局 g_filter_coeff_pool 这样的静态变量。

6.3 内存泄漏检测与调试

尽管内存池能减少某些类型的碎片化，但仍然可能发生泄漏——即分配的块未被释放。

统计信息： MemoryPool 类中的 m_allocated_blocks 计数器是一个简单的泄漏检测工具。在程序结束时，如果这个计数器不为零，就可能存在泄漏。
魔数/哨兵字节： 在每个分配块的头部和/或尾部写入特定的“魔数”。在释放时检查这些魔数是否被覆盖，可以检测到缓冲区溢出或欠流。
分配/释放日志： 在开发阶段，可以在 allocate() 和 deallocate() 中打印日志，记录分配的地址和调用栈，有助于追踪问题。

6.4 性能考量

缓存局部性： 内存池分配的块通常是连续的，这有助于提高缓存命中率。
无系统调用： 一旦内存池初始化完毕，所有的分配和释放操作都在用户空间完成，无需陷入内核，极大减少了开销。
恒定时间操作： allocate() 和 deallocate() 通常是 O(1) 操作，即与内存池的大小或已分配块的数量无关。
对齐： 确保正确的对齐，避免处理器在非对齐访问时产生额外的周期或硬件异常。

6.5 缺点与限制

固定大小限制： 内存池最适合固定大小对象的分配。对于可变大小的分配，需要更复杂的策略（如多个内存池，或者更通用的分配器）。
无法回收内存到操作系统： 内存池一旦从操作系统或预留区域获取了内存，即使所有块都被释放，这块总内存也不会自动归还给操作系统。这在片上内存环境中通常不是问题，因为内存是预留的。
内部碎片： 如果对象大小小于块大小，会造成内部碎片。通过选择合适的块大小可以最小化这种影响。
内存耗尽： 如果组件需要的内存超出了预设的内存池大小，将无法分配。

七、总结

通过自定义 C++ operator new 并结合内存池技术，我们能够为片上系统中的特定组件构建高性能、可预测的内存分配器。这种方法极大地降低了内存分配的开销，避免了碎片化，并允许将数据放置在特定的片上内存区域。虽然需要更细致的内存管理，但其带来的性能提升对于实时、资源受限或需要极致性能的嵌入式应用而言是无可替代的。在设计和实现时，务必考虑内存对齐、线程安全以及适当的错误处理机制。

一、 operator new 的本质与标准分配器的局限性

1.1 operator new 的工作原理

1.2 标准分配器的局限性

1.3 为什么需要自定义 operator new？

二、 片上内存的特性与分配器设计原则

2.1 片上内存（On-Chip Memory）的特性

2.2 片上内存分配器的设计原则

三、 C++ operator new 的重载机制

3.1 全局 operator new 和 operator delete

3.2 类专有 operator new 和 operator delete

四、 构建高性能内存池分配器 (Memory Pool)