Impeller 的 Vulkan/Metal 后端：Descriptor Sets 管理与 Uniform 缓冲区上传

大家好，今天我们来深入探讨 Impeller 渲染引擎的 Vulkan/Metal 后端中两个至关重要的方面：Descriptor Sets 的管理和 Uniform 缓冲区的上传。这两个机制直接影响着渲染效率和内存使用，是理解 Impeller 如何优化图形渲染的关键。

一、Descriptor Sets 的重要性与挑战

在现代图形 API (Vulkan, Metal) 中，Descriptor Sets 扮演着连接 shader 程序和资源（纹理，缓冲）的桥梁。它们本质上是指向 shader 所需数据的指针集合。正确高效地管理 Descriptor Sets 对于性能至关重要。

1.1 Descriptor Sets 的作用

资源绑定: Descriptor Sets 允许我们将纹理、缓冲区、采样器等资源绑定到特定的 shader 阶段（例如顶点 shader，片元 shader）。
着色器输入: Shader 通过 uniform 变量来访问 Descriptor Sets 中绑定的资源。
状态管理: Descriptor Sets 允许我们在不同的绘制调用之间切换资源，而无需重新编译 shader。

1.2 面临的挑战

数量限制: 硬件通常对同时可用的 Descriptor Sets 数量有限制。
更新开销: 频繁更新 Descriptor Sets 会带来性能开销，因为需要更新 GPU 状态。
内存管理: Descriptor Sets 需要分配内存来存储资源指针，需要高效的内存管理策略。
动态更新: 需要支持动态更新 Descriptor Sets，以便在运行时更改资源绑定。

二、Impeller 的 Descriptor Sets 管理策略

Impeller 采用了分层和缓存的策略来管理 Descriptor Sets，旨在最大限度地减少更新开销并优化内存使用。

2.1 DescriptorSetLayoutCache

Impeller 使用 DescriptorSetLayoutCache 来管理 VkDescriptorSetLayout（Vulkan）或相应的 Metal 等价物。DescriptorSetLayout 描述了 Descriptor Sets 的布局，即包含哪些类型的资源以及它们的数量。

// 假设的 DescriptorSetLayoutCache 实现 (Vulkan)
class DescriptorSetLayoutCache {
 public:
  VkDescriptorSetLayout GetDescriptorSetLayout(const DescriptorSetLayoutDescriptor& descriptor) {
    auto it = layout_cache_.find(descriptor);
    if (it != layout_cache_.end()) {
      return it->second;
    }

    VkDescriptorSetLayout layout = CreateDescriptorSetLayout(descriptor); // 创建新的 layout
    layout_cache_[descriptor] = layout;
    return layout;
  }

 private:
  VkDescriptorSetLayout CreateDescriptorSetLayout(const DescriptorSetLayoutDescriptor& descriptor) {
    // 根据 descriptor 创建 VkDescriptorSetLayout 的逻辑
    // 包括 VkDescriptorSetLayoutBinding 的设置
    // ...
    return layout;
  }

  std::map<DescriptorSetLayoutDescriptor, VkDescriptorSetLayout> layout_cache_;
};

DescriptorSetLayoutDescriptor 是一个结构体，用于描述 Descriptor Sets 的布局，例如：

struct DescriptorSetLayoutDescriptor {
  std::vector<VkDescriptorSetLayoutBinding> bindings;

  // 重载 operator== 和 hash 函数，以便用于 std::map
  bool operator==(const DescriptorSetLayoutDescriptor& other) const {
    return bindings == other.bindings;
  }

  // ... hash 函数实现 ...
};

DescriptorSetLayoutCache 的核心思想是：

缓存: layout_cache_ 存储已经创建的 VkDescriptorSetLayout。
复用: 如果需要一个具有相同布局的 VkDescriptorSetLayout，则直接从缓存中返回，避免重复创建。
Key: DescriptorSetLayoutDescriptor 作为缓存的 key，保证具有相同布局的 layout 只会被创建一次。

2.2 DescriptorPool 和 DescriptorSetAllocator

Impeller 使用 DescriptorPool 和 DescriptorSetAllocator 来分配和管理 VkDescriptorSet（Vulkan）或相应的 Metal 等价物。

DescriptorPool: DescriptorPool 负责分配 Descriptor Sets 的内存。它通常会预先分配一块大的内存池，然后从中分配 Descriptor Sets。这比每次需要 Descriptor Set 时都重新分配内存更高效。
DescriptorSetAllocator: DescriptorSetAllocator 负责从 DescriptorPool 中分配 Descriptor Sets，并跟踪已分配的 Descriptor Sets。

// 假设的 DescriptorPool 实现 (Vulkan)
class DescriptorPool {
 public:
  DescriptorPool(VkDevice device, const std::vector<VkDescriptorPoolSize>& pool_sizes, uint32_t max_sets)
      : device_(device) {
    VkDescriptorPoolCreateInfo pool_info{};
    pool_info.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO;
    pool_info.poolSizeCount = pool_sizes.size();
    pool_info.pPoolSizes = pool_sizes.data();
    pool_info.maxSets = max_sets;
    pool_info.flags = VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT; // 允许释放单个 descriptor set

    if (vkCreateDescriptorPool(device_, &pool_info, nullptr, &descriptor_pool_) != VK_SUCCESS) {
      // Handle error
    }
  }

  ~DescriptorPool() {
    vkDestroyDescriptorPool(device_, descriptor_pool_, nullptr);
  }

  VkDescriptorSet AllocateDescriptorSet(VkDescriptorSetLayout layout) {
    VkDescriptorSetAllocateInfo alloc_info{};
    alloc_info.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO;
    alloc_info.descriptorPool = descriptor_pool_;
    alloc_info.descriptorSetCount = 1;
    alloc_info.pSetLayouts = &layout;

    VkDescriptorSet descriptor_set;
    if (vkAllocateDescriptorSets(device_, &alloc_info, &descriptor_set) != VK_SUCCESS) {
      // Handle error, possibly need to resize the pool or allocate a new one
      return VK_NULL_HANDLE;
    }

    return descriptor_set;
  }

  void Reset() {
    vkResetDescriptorPool(device_, descriptor_pool_, 0); // 释放所有 descriptor sets
  }

 private:
  VkDevice device_;
  VkDescriptorPool descriptor_pool_;
};

// 假设的 DescriptorSetAllocator 实现 (Vulkan)
class DescriptorSetAllocator {
 public:
  DescriptorSetAllocator(VkDevice device) : device_(device) {}

  VkDescriptorSet AllocateDescriptorSet(VkDescriptorSetLayout layout) {
    // 从可用的 DescriptorPool 中获取一个 DescriptorSet
    // 如果所有 Pool 都满了，则创建一个新的 Pool
    for (auto& pool : pools_) {
      VkDescriptorSet descriptor_set = pool->AllocateDescriptorSet(layout);
      if (descriptor_set != VK_NULL_HANDLE) {
        return descriptor_set;
      }
    }

    // 创建一个新的 DescriptorPool
    std::vector<VkDescriptorPoolSize> pool_sizes;
    // 根据常用的 descriptor 类型填充 pool_sizes
    // ...
    auto new_pool = std::make_unique<DescriptorPool>(device_, pool_sizes, kDefaultPoolSize);
    VkDescriptorSet descriptor_set = new_pool->AllocateDescriptorSet(layout);
    if (descriptor_set == VK_NULL_HANDLE) {
      // 严重错误，无法分配 DescriptorSet
      return VK_NULL_HANDLE;
    }
    pools_.push_back(std::move(new_pool));
    return descriptor_set;
  }

  void Reset() {
    for (auto& pool : pools_) {
      pool->Reset();
    }
  }

 private:
  VkDevice device_;
  std::vector<std::unique_ptr<DescriptorPool>> pools_;
  static constexpr uint32_t kDefaultPoolSize = 256; // 默认 pool 大小
};

DescriptorSetAllocator 的核心思想是：

多池管理: pools_ 存储多个 DescriptorPool。
按需分配: 当需要分配 Descriptor Set 时，首先尝试从现有的 Pool 中分配，如果所有 Pool 都满了，则创建一个新的 Pool。
重置: Reset() 方法可以重置所有 Pool，释放所有 Descriptor Set。这通常在每一帧的开始或者渲染过程的开始执行。

2.3 DescriptorSetUpdater

Impeller 提供 DescriptorSetUpdater 类来简化 Descriptor Sets 的更新过程。它允许我们批量更新 Descriptor Sets，从而减少 API 调用次数。

// 假设的 DescriptorSetUpdater 实现 (Vulkan)
class DescriptorSetUpdater {
 public:
  DescriptorSetUpdater(VkDevice device) : device_(device) {}

  void Begin(VkDescriptorSet descriptor_set) {
    descriptor_set_ = descriptor_set;
    writes_.clear();
  }

  void WriteTexture(uint32_t binding, VkImageView image_view, VkSampler sampler, VkImageLayout image_layout) {
    VkDescriptorImageInfo image_info{};
    image_info.sampler = sampler;
    image_info.imageView = image_view;
    image_info.imageLayout = image_layout;

    VkWriteDescriptorSet write{};
    write.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    write.dstSet = descriptor_set_;
    write.dstBinding = binding;
    write.dstArrayElement = 0;
    write.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
    write.descriptorCount = 1;
    write.pImageInfo = &image_info;

    writes_.push_back(write);
  }

  void WriteBuffer(uint32_t binding, VkBuffer buffer, VkDeviceSize offset, VkDeviceSize range) {
    VkDescriptorBufferInfo buffer_info{};
    buffer_info.buffer = buffer;
    buffer_info.offset = offset;
    buffer_info.range = range;

    VkWriteDescriptorSet write{};
    write.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
    write.dstSet = descriptor_set_;
    write.dstBinding = binding;
    write.dstArrayElement = 0;
    write.descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER; // 或者 VK_STORAGE_BUFFER
    write.descriptorCount = 1;
    write.pBufferInfo = &buffer_info;

    writes_.push_back(write);
  }

  void Commit() {
    vkUpdateDescriptorSets(device_, writes_.size(), writes_.data(), 0, nullptr);
  }

 private:
  VkDevice device_;
  VkDescriptorSet descriptor_set_;
  std::vector<VkWriteDescriptorSet> writes_;
};

DescriptorSetUpdater 的核心思想是：

批量更新: writes_ 存储所有的更新操作。
减少 API 调用: Commit() 方法一次性提交所有更新操作，减少 vkUpdateDescriptorSets 的调用次数。
简化代码: 提供了 WriteTexture 和 WriteBuffer 等辅助方法，简化了更新 Descriptor Sets 的代码。

2.4 总结

Impeller 的 Descriptor Sets 管理策略可以总结如下：

组件	职责	优点
`DescriptorSetLayoutCache`	管理 `DescriptorSetLayout`，避免重复创建相同布局的 layout。	减少 GPU 资源的创建和销毁，提高性能。
`DescriptorPool`	从预先分配的内存池中分配 Descriptor Sets。	减少内存分配和释放的开销，提高性能。
`DescriptorSetAllocator`	管理多个 `DescriptorPool`，按需分配 Descriptor Sets。	允许动态调整 Descriptor Sets 的数量，避免内存浪费。
`DescriptorSetUpdater`	简化 Descriptor Sets 的更新过程，批量更新 Descriptor Sets。	减少 API 调用次数，提高性能。

三、Uniform 缓冲区的上传策略

Uniform 缓冲区用于存储 shader 的 uniform 变量，例如 MVP 矩阵，颜色，光照参数等。高效上传 Uniform 缓冲区对于性能至关重要。

3.1 Uniform 缓冲区的更新方式

常见的 Uniform 缓冲区更新方式包括：

每次绘制都更新: 每次绘制调用前都更新 Uniform 缓冲区。
帧缓冲: 为每一帧分配一个 Uniform 缓冲区，并在每一帧的开始更新它。
双缓冲/三缓冲: 使用多个 Uniform 缓冲区，在不同的帧之间交替使用，避免 CPU 等待 GPU。

3.2 Impeller 的 Uniform 缓冲区上传策略

Impeller 采用了类似于双缓冲/三缓冲的策略，并结合了内存映射技术来优化 Uniform 缓冲区的上传。

3.2.1 BufferView 和 TransientBufferPool

Impeller 使用 BufferView 来表示 Uniform 缓冲区，并使用 TransientBufferPool 来管理 Uniform 缓冲区的内存。

BufferView: BufferView 是对 VkBuffer（Vulkan）或相应的 Metal 等价物的一个引用，它包含了缓冲区的起始地址和大小。
TransientBufferPool: TransientBufferPool 负责分配和管理临时的 Uniform 缓冲区。它通常会预先分配一块大的内存池，然后从中分配 BufferView。这些缓冲区是临时的，通常在每一帧的结束或者渲染过程的结束被释放。

// 假设的 BufferView 实现
class BufferView {
 public:
  BufferView(VkBuffer buffer, VkDeviceSize offset, VkDeviceSize size)
      : buffer_(buffer), offset_(offset), size_(size) {}

  VkBuffer GetBuffer() const { return buffer_; }
  VkDeviceSize GetOffset() const { return offset_; }
  VkDeviceSize GetSize() const { return size_; }

 private:
  VkBuffer buffer_;
  VkDeviceSize offset_;
  VkDeviceSize size_;
};

// 假设的 TransientBufferPool 实现 (Vulkan)
class TransientBufferPool {
 public:
  TransientBufferPool(VkDevice device, VkPhysicalDevice physical_device, VkQueue queue, uint32_t queue_family_index)
      : device_(device), queue_(queue), queue_family_index_(queue_family_index) {
    VkPhysicalDeviceMemoryProperties memory_properties;
    vkGetPhysicalDeviceMemoryProperties(physical_device, &memory_properties);

    // 选择合适的 memory type
    memory_index_ = FindMemoryType(memory_properties.memoryTypes, memory_properties.memoryTypeCount,
                                   VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);

    VkBufferCreateInfo buffer_info{};
    buffer_info.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
    buffer_info.size = kPoolSize;
    buffer_info.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT; // 支持 Uniform Buffer 和 Transfer
    buffer_info.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
    buffer_info.queueFamilyIndexCount = 1;
    buffer_info.pQueueFamilyIndices = &queue_family_index_;

    if (vkCreateBuffer(device_, &buffer_info, nullptr, &buffer_) != VK_SUCCESS) {
      // Handle error
    }

    VkMemoryRequirements memory_requirements;
    vkGetBufferMemoryRequirements(device_, buffer_, &memory_requirements);

    VkMemoryAllocateInfo allocate_info{};
    allocate_info.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
    allocate_info.allocationSize = memory_requirements.size;
    allocate_info.memoryTypeIndex = memory_index_;

    if (vkAllocateMemory(device_, &allocate_info, nullptr, &memory_) != VK_SUCCESS) {
      // Handle error
    }

    if (vkBindBufferMemory(device_, buffer_, memory_, 0) != VK_SUCCESS) {
      // Handle error
    }

    vkMapMemory(device_, memory_, 0, kPoolSize, 0, &mapped_memory_);
  }

  ~TransientBufferPool() {
    vkUnmapMemory(device_, memory_);
    vkDestroyBuffer(device_, buffer_, nullptr);
    vkFreeMemory(device_, memory_, nullptr);
  }

  BufferView Allocate(VkDeviceSize size) {
    if (offset_ + size > kPoolSize) {
      // Pool 满了，需要 Reset
      return {};
    }

    BufferView view(buffer_, offset_, size);
    offset_ += size;
    return view;
  }

  void* GetMappedPointer() { return mapped_memory_; }

  void Reset() { offset_ = 0; }

 private:
  uint32_t FindMemoryType(const VkMemoryType* memory_types, uint32_t memory_type_count, VkMemoryPropertyFlags properties) {
    for (uint32_t i = 0; i < memory_type_count; ++i) {
      if ((memory_types[i].propertyFlags & properties) == properties) {
        return i;
      }
    }
    return -1; // 找不到合适的 memory type
  }

  VkDevice device_;
  VkQueue queue_;
  uint32_t queue_family_index_;
  VkBuffer buffer_;
  VkDeviceMemory memory_;
  uint32_t memory_index_;
  void* mapped_memory_;
  VkDeviceSize offset_ = 0;
  static constexpr VkDeviceSize kPoolSize = 16 * 1024; // 16KB 默认 Pool 大小
};

TransientBufferPool 的核心思想是：

预分配: 预先分配一块大的内存池。
内存映射: 将内存池映射到 CPU 可访问的地址空间。
快速分配: 通过简单的偏移量增加来分配 BufferView，避免了昂贵的内存分配操作。
Reset: Reset() 方法可以重置偏移量，释放所有 BufferView。

3.2.2 数据上传

Impeller 通过内存映射技术直接将数据写入 Uniform 缓冲区。

// 假设的 Uniform 缓冲区上传代码
TransientBufferPool uniform_buffer_pool(device, physical_device, queue, queue_family_index);

// 每一帧开始时 Reset Pool
uniform_buffer_pool.Reset();

// 获取 BufferView
BufferView mvp_buffer_view = uniform_buffer_pool.Allocate(sizeof(MVPMatrix));

// 获取映射的内存指针
void* mapped_memory = uniform_buffer_pool.GetMappedPointer();

// 计算 MVP 矩阵的地址
MVPMatrix* mvp_matrix = reinterpret_cast<MVPMatrix*>(static_cast<char*>(mapped_memory) + mvp_buffer_view.GetOffset());

// 填充 MVP 矩阵
*mvp_matrix = CalculateMVPMatrix();

//  ... 其他 uniform 变量的上传 ...

// 在绘制时，将 mvp_buffer_view 绑定到 Descriptor Set
descriptor_set_updater.Begin(descriptor_set);
descriptor_set_updater.WriteBuffer(kMVPBinding, mvp_buffer_view.GetBuffer(), mvp_buffer_view.GetOffset(), mvp_buffer_view.GetSize());
descriptor_set_updater.Commit();

//绘制
vkCmdDraw(command_buffer, ...);

3.2.3 帧缓冲

为了避免 CPU 等待 GPU，Impeller 会使用多个 TransientBufferPool，在不同的帧之间交替使用。

// 假设的使用帧缓冲的Uniform缓冲区上传代码
std::vector<TransientBufferPool> uniform_buffer_pools;
uniform_buffer_pools.emplace_back(device, physical_device, queue, queue_family_index);
uniform_buffer_pools.emplace_back(device, physical_device, queue, queue_family_index);
uniform_buffer_pools.emplace_back(device, physical_device, queue, queue_family_index); // 三缓冲

uint32_t frame_index = 0;

// 在每一帧的渲染循环中
{
  // 选择当前帧的 TransientBufferPool
  TransientBufferPool& current_pool = uniform_buffer_pools[frame_index % uniform_buffer_pools.size()];

  // 每一帧开始时 Reset Pool
  current_pool.Reset();

  // 获取 BufferView
  BufferView mvp_buffer_view = current_pool.Allocate(sizeof(MVPMatrix));

  // 获取映射的内存指针
  void* mapped_memory = current_pool.GetMappedPointer();

  // 计算 MVP 矩阵的地址
  MVPMatrix* mvp_matrix = reinterpret_cast<MVPMatrix*>(static_cast<char*>(mapped_memory) + mvp_buffer_view.GetOffset());

  // 填充 MVP 矩阵
  *mvp_matrix = CalculateMVPMatrix();

  //  ... 其他 uniform 变量的上传 ...

  // 在绘制时，将 mvp_buffer_view 绑定到 Descriptor Set
  descriptor_set_updater.Begin(descriptor_set);
  descriptor_set_updater.WriteBuffer(kMVPBinding, mvp_buffer_view.GetBuffer(), mvp_buffer_view.GetOffset(), mvp_buffer_view.GetSize());
  descriptor_set_updater.Commit();

  //绘制
  vkCmdDraw(command_buffer, ...);

  frame_index++;
}

3.3 总结

Impeller 的 Uniform 缓冲区上传策略可以总结如下：

组件	职责	优点
`BufferView`	表示对 `VkBuffer` 的一个引用。	封装了缓冲区的起始地址和大小，方便管理。
`TransientBufferPool`	管理临时的 Uniform 缓冲区。	预分配内存池，内存映射，快速分配，减少内存分配和释放的开销。
帧缓冲 (多 `TransientBufferPool`)	避免 CPU 等待 GPU。	CPU 可以提前准备下一帧的数据，提高并行性。
内存映射	直接将数据写入 Uniform 缓冲区。	避免了额外的内存拷贝，提高性能。

四、关键点概括

我们讨论了Impeller的Vulkan/Metal后端中Descriptor Sets的管理和Uniform缓冲区的上传。Descriptor Sets通过缓存和分层的方式进行管理，减少更新开销和优化内存使用。Uniform缓冲区利用双缓冲/三缓冲策略和内存映射技术来高效上传。这些策略共同作用，提升了Impeller渲染引擎的性能。