深入探讨 `WebGPU` 的 `Pipeline State Objects` (`PSO`), `Bind Groups`, `Render Passes` 等核心概念，以及如何实现高性能的 2D/3D 渲染。

嗨，各位图形界的大佬、未来的大佬，以及正在努力成为大佬的同学们！欢迎来到今天的 WebGPU 专题讲座。今天咱们要聊聊 WebGPU 里的那些“灵魂人物”：Pipeline State Objects (PSO)、Bind Groups、Render Passes，以及如何用它们打造高性能的 2D/3D 渲染。

准备好了吗？系好安全带，咱们发车啦！

第一站：Pipeline State Objects (PSO) – 渲染的灵魂人物

想象一下，你要做一道菜，是不是得先准备好食材、厨具、火候等等各种条件？ WebGPU 渲染也是一样，需要告诉 GPU 怎么画，用什么颜色，怎么混合，等等等等。 PSO 就是用来封装这些渲染设置的。

简单来说，PSO 定义了渲染管线的所有状态，包括：

Vertex Shader (顶点着色器): 负责处理顶点数据，转换顶点位置，计算法线等。
Fragment Shader (片元着色器): 负责处理每个像素的颜色，光照等。
Primitive Topology (图元拓扑): 定义了如何将顶点数据组装成图元（三角形、线段等）。
Rasterization State (光栅化状态): 定义了如何将图元光栅化成像素，包括剔除模式、裁剪模式等。
Depth/Stencil State (深度/模板状态): 定义了深度和模板测试的行为。
Blend State (混合状态): 定义了如何将新的像素颜色与已有的颜色混合。
Multisample State (多重采样状态): 定义了多重采样的配置。

创建一个 PSO，就像是给 GPU 制定了一份详细的“渲染说明书”。一旦创建完成，就可以重复使用，避免了每次渲染都重新设置这些状态的开销。

代码示例：创建一个简单的 PSO

async function createRenderPipeline(device, presentationFormat) {
  const shaderModule = device.createShaderModule({
    code: `
      @vertex
      fn vs_main(@builtin(vertex_index) in_vertex_index: u32) -> @builtin(position) vec4<f32> {
        let positions = array<vec2<f32>, 3>(
          vec2<f32>( 0.0,  0.5),
          vec2<f32>(-0.5, -0.5),
          vec2<f32>( 0.5, -0.5)
        );
        return vec4<f32>(positions[in_vertex_index], 0.0, 1.0);
      }

      @fragment
      fn fs_main() -> @location(0) vec4<f32> {
        return vec4<f32>(1.0, 0.0, 0.0, 1.0); // Red color
      }
    `,
  });

  const renderPipeline = device.createRenderPipeline({
    layout: 'auto', // We'll talk about layouts later!
    vertex: {
      module: shaderModule,
      entryPoint: 'vs_main',
    },
    fragment: {
      module: shaderModule,
      entryPoint: 'fs_main',
      targets: [{ format: presentationFormat }],
    },
    primitive: {
      topology: 'triangle-list',
    },
  });

  return renderPipeline;
}

这段代码创建了一个简单的 PSO，用于渲染一个红色的三角形。注意 layout: 'auto'，这是个偷懒的写法，WebGPU 会自动推断布局。但是，在实际项目中，强烈建议手动定义布局，这样可以更好地控制数据绑定。

第二站：Bind Groups – 数据传递的桥梁

有了 PSO，我们知道怎么画了，但是要画什么呢？这就需要 Bind Groups 来帮忙了。

Bind Groups 可以理解为 PSO 的“数据仓库”，它包含了渲染所需的各种资源，比如：

Uniform Buffers: 存储着色器需要的常量数据，比如 MVP 矩阵、颜色等。
Textures: 存储图像数据，比如纹理贴图、法线贴图等。
Samplers: 定义了如何对纹理进行采样，比如过滤模式、寻址模式等。
Storage Buffers: 存储着色器可以读写的可变数据，比如粒子位置、骨骼动画数据等。
Storage Textures: 存储着色器可以读写的纹理数据。

Bind Groups 通过 Bind Group Layout 与 PSO 关联起来。Bind Group Layout 定义了 Bind Group 中包含哪些资源，以及这些资源的类型、绑定槽位等信息。

代码示例：创建 Bind Group 和 Bind Group Layout

async function createBindGroup(device, renderPipeline) {
  const uniformBufferSize = 16; // 4x4 matrix
  const uniformBuffer = device.createBuffer({
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });

  const bindGroupLayout = renderPipeline.getBindGroupLayout(0); // Get the layout from the pipeline
  const bindGroup = device.createBindGroup({
    layout: bindGroupLayout,
    entries: [
      {
        binding: 0,
        resource: {
          buffer: uniformBuffer,
        },
      },
    ],
  });

  return { uniformBuffer, bindGroup };
}

这段代码创建了一个 Bind Group，其中包含一个 Uniform Buffer，用于存储 MVP 矩阵。注意 renderPipeline.getBindGroupLayout(0)，这里我们从 PSO 中获取了 Bind Group Layout。这是因为我们在创建PSO时使用了 'auto'的layout，所以可以在PSO创建之后获取layout。如果手动创建layout，则需要在创建PSO时指定layout。

手动创建Bind Group Layout的例子

async function createBindGroupWithManualLayout(device, presentationFormat) {
  const uniformBufferSize = 16; // 4x4 matrix
  const uniformBuffer = device.createBuffer({
    size: uniformBufferSize,
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
  });

    const bindGroupLayout = device.createBindGroupLayout({
        entries: [
            {
                binding: 0,
                visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT,
                buffer: {
                    type: 'uniform'
                }
            }
        ]
    });

  const renderPipeline = device.createRenderPipeline({
    layout: device.createPipelineLayout({ bindGroupLayouts: [bindGroupLayout] }),
    vertex: {
      module: shaderModule,
      entryPoint: 'vs_main',
    },
    fragment: {
      module: shaderModule,
      entryPoint: 'fs_main',
      targets: [{ format: presentationFormat }],
    },
    primitive: {
      topology: 'triangle-list',
    },
  });

  const bindGroup = device.createBindGroup({
    layout: bindGroupLayout,
    entries: [
      {
        binding: 0,
        resource: {
          buffer: uniformBuffer,
        },
      },
    ],
  });

  return { uniformBuffer, bindGroup, renderPipeline };
}

注意，Bind Group 是不可变的。也就是说，一旦创建完成，就不能修改其中的资源。如果需要更新数据，需要创建新的 Bind Group。

第三站：Render Passes – 渲染的舞台

有了 PSO 和 Bind Groups，我们就可以开始渲染了。Render Pass 就像是一个“渲染舞台”，它定义了渲染的目标（比如颜色附件、深度附件），以及渲染的流程。

Render Pass 可以包含多个 Draw Calls，每个 Draw Call 都会使用一个 PSO 和一组 Bind Groups。

代码示例：创建一个简单的 Render Pass

async function render(device, context, renderPipeline, bindGroup, uniformBuffer) {
  const commandEncoder = device.createCommandEncoder();
  const textureView = context.getCurrentTexture().createView();

  const renderPassDescriptor = {
    colorAttachments: [
      {
        view: textureView,
        clearValue: { r: 0.0, g: 0.0, b: 0.0, a: 1.0 }, // Clear to black
        loadOp: 'clear',
        storeOp: 'store',
      },
    ],
  };

  const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
  passEncoder.setPipeline(renderPipeline);
  passEncoder.setBindGroup(0, bindGroup.bindGroup);

  // Update uniform buffer data (e.g., MVP matrix)
  const modelMatrix = mat4.create();
  mat4.rotateY(modelMatrix, modelMatrix, performance.now() / 1000);  // Animate the rotation
  device.queue.writeBuffer(uniformBuffer, 0, modelMatrix); //writeBuffer needs arraybuffer or typedarray.

  passEncoder.draw(3, 1, 0, 0); // Draw 3 vertices, 1 instance
  passEncoder.end();

  const commandBuffer = commandEncoder.finish();
  device.queue.submit([commandBuffer]);
}

这段代码创建了一个简单的 Render Pass，它会将渲染结果输出到 Canvas 的颜色附件上。注意 passEncoder.setPipeline(renderPipeline) 和 passEncoder.setBindGroup(0, bindGroup.bindGroup)，这里我们将 PSO 和 Bind Group 设置到 Render Pass 中。 passEncoder.draw(3, 1, 0, 0) 告诉 GPU 画三个顶点，绘制一个实例。

第四站：优化技巧 – 如何榨干 GPU 的每一滴性能

有了 PSO、Bind Groups 和 Render Passes，我们就可以开始构建复杂的渲染场景了。但是，如何才能榨干 GPU 的每一滴性能呢？下面是一些常用的优化技巧：

减少 Draw Calls: Draw Calls 是 GPU 渲染的主要开销之一。尽量将多个物体合并成一个 Draw Call，可以显著提高性能。可以使用 Instance Rendering 或者 Meshlet Rendering 等技术。
使用 Pipeline Caching: PSO 的创建是比较耗时的。可以使用 Pipeline Caching 技术，将已经创建的 PSO 缓存起来，下次使用时直接从缓存中加载。
合理使用 Bind Groups: 尽量将不经常变化的资源放到一个 Bind Group 中，将经常变化的资源放到另一个 Bind Group 中。这样可以减少 Bind Group 的切换次数。
使用 Texture Compression: 纹理贴图是占用 GPU 内存的大户。可以使用 Texture Compression 技术，减少纹理贴图的内存占用和带宽消耗。
优化 Shader 代码: Shader 代码的效率直接影响渲染性能。可以使用 Shader Profiling 工具，分析 Shader 代码的性能瓶颈，并进行优化。
使用 WebGPU 的扩展: WebGPU 提供了很多扩展，可以提供额外的功能和性能优化。比如，可以使用 Dawn 扩展，提供更好的调试支持。

代码示例：使用 Instance Rendering

// Assume we have an array of model matrices for each instance: instanceModelMatrices

const instanceCount = instanceModelMatrices.length;

// Create a buffer to hold all instance model matrices
const instanceBuffer = device.createBuffer({
    size: instanceCount * 16 * Float32Array.BYTES_PER_ELEMENT, // 16 floats per matrix
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
});

// Write the instance model matrices to the buffer
const instanceData = new Float32Array(instanceCount * 16);
for (let i = 0; i < instanceCount; ++i) {
  instanceData.set(instanceModelMatrices[i], i * 16); //copy matrix data to the typed array
}
device.queue.writeBuffer(instanceBuffer, 0, instanceData);

// Update the vertex shader to read instance model matrices
const shaderModule = device.createShaderModule({
    code: `
        struct VertexInput {
            @location(0) position: vec3<f32>,
            @location(1) normal: vec3<f32>,
            @location(2) uv: vec2<f32>,
            @location(3) modelMatrix0: vec4<f32>, // Instance model matrix row 0
            @location(4) modelMatrix1: vec4<f32>, // Instance model matrix row 1
            @location(5) modelMatrix2: vec4<f32>, // Instance model matrix row 2
            @location(6) modelMatrix3: vec4<f32>, // Instance model matrix row 3
        };

        @vertex
        fn vs_main(input: VertexInput) -> @builtin(position) vec4<f32> {
            let modelMatrix = mat4x4<f32>(
                input.modelMatrix0,
                input.modelMatrix1,
                input.modelMatrix2,
                input.modelMatrix3,
            );
            return modelMatrix * vec4<f32>(input.position, 1.0);
        }

        @fragment
        fn fs_main() -> @location(0) vec4<f32> {
            return vec4<f32>(1.0, 0.0, 0.0, 1.0); // Red color
        }
    `,
});

const vertexBufferLayout = [
  {
    arrayStride: 3 * Float32Array.BYTES_PER_ELEMENT, // Size of each vertex in the buffer
    attributes: [
      {
        shaderLocation: 0, // @location(0)
        offset: 0,
        format: 'float32x3',
      },
    ],
  },
  {
    arrayStride: 16 * Float32Array.BYTES_PER_ELEMENT, // Size of each instance data
    stepMode: 'instance',
    attributes: [
      {
        shaderLocation: 3, // @location(3)
        offset: 0,
        format: 'float32x4',
      },
      {
        shaderLocation: 4, // @location(4)
        offset: 4 * Float32Array.BYTES_PER_ELEMENT,
        format: 'float32x4',
      },
      {
        shaderLocation: 5, // @location(5)
        offset: 8 * Float32Array.BYTES_PER_ELEMENT,
        format: 'float32x4',
      },
      {
        shaderLocation: 6, // @location(6)
        offset: 12 * Float32Array.BYTES_PER_ELEMENT,
        format: 'float32x4',
      },
    ],
  },
];

const renderPipeline = device.createRenderPipeline({
    layout: 'auto',
    vertex: {
        module: shaderModule,
        entryPoint: 'vs_main',
        buffers: vertexBufferLayout, // Add the instance buffer layout
    },
    fragment: {
        module: shaderModule,
        entryPoint: 'fs_main',
        targets: [{ format: presentationFormat }],
    },
    primitive: {
        topology: 'triangle-list',
    },
});

// In the render pass:
passEncoder.setVertexBuffer(0, vertexBuffer); // Your vertex buffer
passEncoder.setVertexBuffer(1, instanceBuffer); // The instance buffer
passEncoder.draw(3, instanceCount, 0, 0); // Draw 3 vertices, instanceCount instances

总结：

概念	作用	比喻
PSO	定义渲染管线的所有状态	渲染的“说明书”，告诉 GPU 怎么画
Bind Groups	存储渲染所需的各种资源	PSO 的“数据仓库”，提供渲染所需的数据
Render Passes	定义渲染的目标和流程	渲染的“舞台”，组织渲染的流程
Instance Rendering	通过绘制多个实例来减少draw call	复制粘贴，快速渲染多个相同模型

WebGPU 的 PSO、Bind Groups 和 Render Passes 是构建高性能渲染应用的基础。理解这些概念，并掌握优化技巧，你就可以在 Web 平台上创造出令人惊叹的 2D/3D 体验。

今天的讲座就到这里。希望大家有所收获，也希望大家多多实践，将这些知识应用到实际项目中。如果有任何问题，欢迎随时提问。

下次再见！

发表回复 取消回复

发表回复取消回复