JS `WebGPU` `Bind Group Layouts` 与 `Pipeline Layouts` 的效率影响

咳咳，各位观众老爷们，晚上好！我是你们的老朋友，今晚就来跟大家唠唠WebGPU里那些个“Layouts”的事儿，也就是Bind Group Layouts和Pipeline Layouts，看看它们到底怎么影响性能。

开场白：Layouts，WebGPU的“排兵布阵”

在WebGPU的世界里，数据要给Shader用，得先安排好。想象一下，Shader就像战场上的将军，Bind Group Layouts和Pipeline Layouts就是将军手里的兵力部署图。它们告诉WebGPU，哪些资源（比如纹理、uniform buffer）以什么样的方式、在哪个位置提供给Shader。如果部署得当，将军就能指挥若定，战无不胜；部署失误，轻则效率低下，重则直接卡壳。

第一幕：Bind Group Layouts，资源的“身份证”

Bind Group Layouts，顾名思义，是定义Bind Group的“布局”。Bind Group可以理解为Shader需要的一组资源的集合。而Bind Group Layout就像是给这组资源颁发的“身份证”，它描述了这组资源里面都有啥，以及Shader怎么用它们。

定义Bind Group Layout：

const bindGroupLayout = device.createBindGroupLayout({
  entries: [
    {
      binding: 0, // 资源绑定的位置
      visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT, // 哪些Shader阶段可见
      buffer: {
        type: 'uniform', // 缓冲区类型，这里是uniform buffer
      },
    },
    {
      binding: 1,
      visibility: GPUShaderStage.FRAGMENT,
      sampler: {
        type: 'filtering', // 采样器类型
      },
    },
    {
      binding: 2,
      visibility: GPUShaderStage.FRAGMENT,
      texture: {
        sampleType: 'float', // 纹理采样类型
        viewDimension: '2d', // 纹理维度
      },
    },
  ],
});

这段代码定义了一个Bind Group Layout，它包含三个条目：

一个uniform buffer，绑定在binding 0，对Vertex和Fragment Shader都可见。
一个sampler，绑定在binding 1，只对Fragment Shader可见。
一个2D纹理，绑定在binding 2，也只对Fragment Shader可见。

创建Bind Group：

有了Bind Group Layout，就可以根据它来创建Bind Group了：

const bindGroup = device.createBindGroup({
  layout: bindGroupLayout, // 使用之前定义的Layout
  entries: [
    {
      binding: 0,
      resource: {
        buffer: uniformBuffer, // 绑定的buffer资源
      },
    },
    {
      binding: 1,
      resource: sampler, // 绑定的sampler资源
    },
    {
      binding: 2,
      resource: textureView, // 绑定的纹理View资源
    },
  ],
});

这里，我们把实际的buffer、sampler和纹理View绑定到对应的binding上，创建了一个Bind Group。

第二幕：Pipeline Layouts，Shader的“作战计划”

Pipeline Layouts，决定了整个渲染管线（Pipeline）如何使用Bind Group。它定义了Pipeline需要哪些Bind Group Layouts，以及这些Layouts在哪个位置（group）使用。Pipeline Layout就像是Shader的“作战计划”，它告诉WebGPU，这个Shader需要哪些资源，以及如何获取这些资源。

创建Pipeline Layout：

const pipelineLayout = device.createPipelineLayout({
  bindGroupLayouts: [bindGroupLayout], // 使用之前定义的Bind Group Layout
});

这段代码创建了一个Pipeline Layout，它只包含一个Bind Group Layout。这意味着这个Pipeline只需要一个Bind Group提供资源。

创建Render Pipeline：

有了Pipeline Layout，就可以创建Render Pipeline了：

const renderPipeline = device.createRenderPipeline({
  layout: pipelineLayout, // 使用之前定义的Pipeline Layout
  vertex: {
    module: vertexShaderModule, // 顶点Shader模块
    entryPoint: 'main', // 顶点Shader入口点
  },
  fragment: {
    module: fragmentShaderModule, // 片段Shader模块
    entryPoint: 'main', // 片段Shader入口点
    targets: [{ format: presentationFormat }], // 渲染目标格式
  },
  primitive: {
    topology: 'triangle-list', // 图元拓扑结构
  },
});

这里，我们将Pipeline Layout传递给createRenderPipeline函数，告诉WebGPU这个Pipeline需要哪些资源。

第三幕：性能影响，“排兵布阵”的艺术

好了，铺垫了这么多，终于要说到正题了：Bind Group Layouts和Pipeline Layouts到底怎么影响性能？

Bind Group Layout的缓存和重用：

每次创建Render Pipeline都需要指定Pipeline Layout，而Pipeline Layout又包含了Bind Group Layout。如果每次创建Pipeline都重新创建Bind Group Layout，那就会造成不必要的开销。

正确的做法： 尽量重用Bind Group Layout。如果多个Pipeline需要的资源结构相同，就可以使用同一个Bind Group Layout。

// 错误的做法：每次创建Pipeline都重新创建Bind Group Layout
for (let i = 0; i < 100; i++) {
  const bindGroupLayout = device.createBindGroupLayout({ /* ... */ });
  const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
  const renderPipeline = device.createRenderPipeline({ layout: pipelineLayout, /* ... */ });
}

// 正确的做法：重用Bind Group Layout
const bindGroupLayout = device.createBindGroupLayout({ /* ... */ });
for (let i = 0; i < 100; i++) {
  const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
  const renderPipeline = device.createRenderPipeline({ layout: pipelineLayout, /* ... */ });
}

结论： 重用Bind Group Layout可以减少创建对象的开销，提高性能。

Pipeline Layout的复杂度：

Pipeline Layout的复杂度取决于它包含的Bind Group Layout的数量。如果一个Pipeline需要大量的Bind Group，那么创建和管理Pipeline Layout的开销就会增加。

如何降低Pipeline Layout的复杂度？

合并Bind Group： 尽量将相关的资源放在同一个Bind Group中，减少Bind Group的数量。
使用Push Constants： 对于少量、频繁更新的数据，可以使用Push Constants，避免创建Bind Group。Push Constants是一种直接将数据推送到Shader的方式，不需要通过Bind Group。

Bind Group的更新频率：

Bind Group的更新频率也会影响性能。如果Bind Group的内容频繁变化，那么每次更新都需要重新绑定资源，这会造成额外的开销。

如何降低Bind Group的更新频率？

使用Uniform Buffer Object (UBO)： 将频繁更新的数据放在UBO中，只需要更新UBO的内容，而不需要重新创建Bind Group。
使用Storage Buffer Object (SSBO)： 对于大量、需要随机访问的数据，可以使用SSBO。SSBO可以在Shader中直接读写，避免频繁的数据传输。

Bind Group的兼容性：

不同的Pipeline可能需要不同的Bind Group，但如果这些Bind Group的Layout兼容（即资源类型和数量相同），那么就可以使用同一个Bind Group。

如何提高Bind Group的兼容性？

统一资源类型： 尽量使用相同的资源类型，比如都使用float类型的纹理，或者都使用uniform类型的buffer。
预留Binding位置： 在定义Bind Group Layout时，可以预留一些Binding位置，以便将来添加新的资源。

第四幕：实战演练，代码说话

为了更直观地了解性能影响，我们来做个小实验：

实验一：Bind Group Layout的重用

// 场景：创建100个Render Pipeline，分别使用不同的Bind Group Layout和相同的Bind Group Layout
async function testBindGroupLayoutReuse() {
  const adapter = await navigator.gpu.requestAdapter();
  const device = await adapter.requestDevice();

  const vertexShaderModule = device.createShaderModule({
    code: `
      @vertex
      fn main(@location(0) pos: vec2f) -> @builtin(position) vec4f {
        return vec4f(pos, 0.0, 1.0);
      }
    `,
  });

  const fragmentShaderModule = device.createShaderModule({
    code: `
      @fragment
      fn main() -> @location(0) vec4f {
        return vec4f(1.0, 0.0, 0.0, 1.0);
      }
    `,
  });

  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();

  // 1. 使用不同的Bind Group Layout
  const startTime1 = performance.now();
  for (let i = 0; i < 100; i++) {
    const bindGroupLayout = device.createBindGroupLayout({
      entries: [], // 空的Bind Group Layout
    });
    const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
    const renderPipeline = device.createRenderPipeline({
      layout: pipelineLayout,
      vertex: {
        module: vertexShaderModule,
        entryPoint: 'main',
      },
      fragment: {
        module: fragmentShaderModule,
        entryPoint: 'main',
        targets: [{ format: presentationFormat }],
      },
      primitive: {
        topology: 'triangle-list',
      },
    });
  }
  const endTime1 = performance.now();
  console.log(`使用不同的Bind Group Layout耗时：${endTime1 - startTime1}ms`);

  // 2. 使用相同的Bind Group Layout
  const bindGroupLayout = device.createBindGroupLayout({
    entries: [], // 空的Bind Group Layout
  });
  const startTime2 = performance.now();
  for (let i = 0; i < 100; i++) {
    const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
    const renderPipeline = device.createRenderPipeline({
      layout: pipelineLayout,
      vertex: {
        module: vertexShaderModule,
        entryPoint: 'main',
      },
      fragment: {
        module: fragmentShaderModule,
        entryPoint: 'main',
        targets: [{ format: presentationFormat }],
      },
      primitive: {
        topology: 'triangle-list',
      },
    });
  }
  const endTime2 = performance.now();
  console.log(`使用相同的Bind Group Layout耗时：${endTime2 - startTime2}ms`);
}

testBindGroupLayoutReuse();

运行结果会显示，使用相同的Bind Group Layout比使用不同的Bind Group Layout快很多。

实验二：Push Constants的使用

// 场景：使用Bind Group和Push Constants传递数据
async function testPushConstants() {
  const adapter = await navigator.gpu.requestAdapter();
  const device = await adapter.requestDevice();

  const vertexShaderModule = device.createShaderModule({
    code: `
      struct PushConstants {
        color: vec4f,
      }
      @group(0) @binding(0) var<uniform> transformMatrix : mat4x4f;
      @vertex
      fn main(@location(0) pos: vec2f) -> @builtin(position) vec4f {
        let transformedPos = transformMatrix * vec4f(pos, 0.0, 1.0);
        return transformedPos;
      }
    `,
  });

  const fragmentShaderModule = device.createShaderModule({
    code: `
      struct PushConstants {
        color: vec4f,
      }
      @fragment
      fn main() -> @location(0) vec4f {
        return push_constant.color; // 使用Push Constants传递颜色
      }
    `,
  });

  const presentationFormat = navigator.gpu.getPreferredCanvasFormat();

  // 1. 使用Bind Group传递颜色
  const bindGroupLayout = device.createBindGroupLayout({
    entries: [
        {
            binding: 0,
            visibility: GPUShaderStage.VERTEX,
            buffer: {
              type: 'uniform',
            },
          },
      {
        binding: 1,
        visibility: GPUShaderStage.FRAGMENT,
        buffer: {
          type: 'uniform',
        },
      },
    ],
  });

  const pipelineLayout = device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] });
    const uniformBufferSize = 4 * 4 * 4; // mat4x4<f32>
    const uniformBuffer = device.createBuffer({
        size: uniformBufferSize,
        usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
    });

  const colorBufferSize = 4 * 4;
  const colorBuffer = device.createBuffer({
    size: colorBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });
  const bindGroup = device.createBindGroup({
    layout: bindGroupLayout,
    entries: [
        {
            binding: 0,
            resource: {
                buffer: uniformBuffer,
            },
        },
      {
        binding: 1,
        resource: {
          buffer: colorBuffer,
        },
      },
    ],
  });

  const renderPipeline1 = device.createRenderPipeline({
    layout: pipelineLayout,
    vertex: {
      module: vertexShaderModule,
      entryPoint: 'main',
    },
    fragment: {
      module: fragmentShaderModule,
      entryPoint: 'main',
      targets: [{ format: presentationFormat }],
    },
    primitive: {
      topology: 'triangle-list',
    },
  });

  // 2. 使用Push Constants传递颜色 (需要修改shader代码，并添加pushConstantRanges)
  const renderPipeline2 = device.createRenderPipeline({
    layout: device.createPipelineLayout({
        bindGroupLayouts: [],
        pushConstantRanges: [{ stage: GPUShaderStage.FRAGMENT, range: 0..15 }], // 16 bytes for vec4f
    }),
    vertex: {
      module: device.createShaderModule({
        code: `
          @group(0) @binding(0) var<uniform> transformMatrix : mat4x4f;
          @vertex
          fn main(@location(0) pos: vec2f) -> @builtin(position) vec4f {
            let transformedPos = transformMatrix * vec4f(pos, 0.0, 1.0);
            return transformedPos;
          }
        `,
      }),
      entryPoint: 'main',
    },
    fragment: {
      module: device.createShaderModule({
        code: `
          struct PushConstants {
            color: vec4f,
          }
          @fragment
          fn main() -> @location(0) vec4f {
            return push_constant.color; // 使用Push Constants传递颜色
          }
        `,
      }),
      entryPoint: 'main',
      targets: [{ format: presentationFormat }],
    },
    primitive: {
      topology: 'triangle-list',
    },
  });
    const transformMatrixData = new Float32Array([
        1, 0, 0, 0,
        0, 1, 0, 0,
        0, 0, 1, 0,
        0, 0, 0, 1,
    ]);
    device.queue.writeBuffer(uniformBuffer, 0, transformMatrixData);
  // 渲染
  const commandEncoder = device.createCommandEncoder();
  const textureView = context.getCurrentTexture().createView();
  const renderPassDescriptor = {
    colorAttachments: [
      {
        view: textureView,
        clearValue: { r: 0.0, g: 0.0, b: 0.0, a: 1.0 },
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  const pass = commandEncoder.beginRenderPass(renderPassDescriptor);

  // 使用Bind Group渲染
  pass.setPipeline(renderPipeline1);
  pass.setBindGroup(0, bindGroup);
    const colorData = new Float32Array([1.0, 0.0, 0.0, 1.0]);
    device.queue.writeBuffer(colorBuffer, 0, colorData);

  pass.draw(3);

  // 使用Push Constants渲染
  pass.setPipeline(renderPipeline2);
  const color = new Float32Array([0.0, 1.0, 0.0, 1.0]); // 绿色
  pass.pushConstants({ stageFlags: GPUShaderStage.FRAGMENT, offset: 0, data: color });
    device.queue.writeBuffer(uniformBuffer, 0, transformMatrixData);
  pass.draw(3);

  pass.end();

  device.queue.submit([commandEncoder.finish()]);
}

testPushConstants();

这个实验比较了使用Bind Group和Push Constants传递数据的效率。一般来说，对于少量、频繁更新的数据，使用Push Constants效率更高。

第五幕：总结陈词，“排兵布阵”的智慧

好了，说了这么多，总结一下：

Bind Group Layouts和Pipeline Layouts是WebGPU中非常重要的概念，它们直接影响着性能。
重用Bind Group Layout可以减少创建对象的开销。
降低Pipeline Layout的复杂度可以提高渲染效率。
合理地更新Bind Group可以避免不必要的开销。
根据实际情况选择Bind Group或Push Constants传递数据。

记住，WebGPU的性能优化是一门“排兵布阵”的艺术，需要根据具体的场景和需求，灵活运用各种技巧。希望今天的讲座能帮助大家更好地理解Bind Group Layouts和Pipeline Layouts，并在WebGPU的世界里战无不胜！

感谢各位的观看，我们下期再见！

发表回复 取消回复

发表回复取消回复