[Unity]大批量物体渲染学习笔记（二）

2021-12-03 本文已影响0人 pamisu

上一篇使用Graphics.DrawMeshInstancedIndirect实现了基本的物体渲染，但还没有做剔除，相机视野外的物体也会被渲染，造成性能上的浪费。这一篇来做剔除方案中常见的视锥剔除，并顺便实现物体的旋转与缩放。

简单来说，视锥剔除就是判断物体是否在当前相机的视锥体内，排除掉完全在视锥体外的物体，仅渲染视锥体内的物体，减少不必要的消耗。需要强调的是，只有在使用类似DrawMeshInstancedIndirect这样的API做渲染时，才需要自己做剔除工作，用自带的Renderer组件渲染物体时Unity会帮我们做这些处理。

关于视锥剔除如何实现以及为什么要用ComputeShader做视锥剔除，推荐一篇文章：

Unity中使用ComputeShader做视锥剔除（View Frustum Culling）

大佬的文章讲得很详细，包括视锥剔除的原理、ComputeShader如何使用、如何根据物体的包围盒进行剔除等等，可以说是保姆级教学了。

所以这里仅仅记录个人的实现和踩坑过程，由于需要顺便实现物体的旋转与缩放，会有一些不同之处。最终效果：

整体思路并不复杂，每一帧我们需要做这些事：

获取当前相机视锥的六个面的定义，即平面方程Ax+By+Cz+D=0，可以自己计算，也可以通过API获取。
获取每个物体的包围盒，通常包围盒的大小可以是一个定值，判断时再根据物体当前变换（平移、旋转、缩放）计算包围盒八个点的实际坐标。
把上面的东西扔到CoumputeShader里计算，判断哪些物体在视锥体内，返回这些物体的instanceID。
根据返回的instanceID渲染，而不是渲染全部物体，这样渲染出来的就是剔除后的结果了。

除了上面文章中提到的方法之外，还有另一种剔除方法，即将包围盒顶点坐标换到裁剪空间下，判断每个点的齐次坐标的x、y、z是否在[-w, w]区间（DirectX则判断是否在[0, w]），如果八个点都不在视锥体内，则将其剔除。这个方法的计算量比上面的方法少一些，但缺点也很明显，如果当前相机正紧贴着一堵超大的墙，比如主角正在面壁思过，那么此时墙体包围盒的八个点都不在视锥体内，就会被剔除掉，所以这种方法适合较小的物体，本篇笔记中也会顺便实现这种方法。

ComputeShader

先来实现第一种剔除方法，整个剔除过程基本上是围绕着ComputeShader进行，搞清楚输入与输出后，可以编写ComputeShader了：

FrustumCulling.compute

#pragma kernel CSMain

float4 _FrustumPlanes[6];   // 视锥体的六个面
float3 _BoundMin;   // 物体包围盒最小点
float3 _BoundMax;   // 物体包围盒最大点
StructuredBuffer<float4x4> _AllMatricesBuffer;   // 所有物体的复合变换矩阵
AppendStructuredBuffer<uint> _VisibleIDsBuffer;  // 可见物体实例ID

bool IsOutsideThePlane(float4 plane, float3 position)
{
    return dot(plane.xyz, position) + plane.w > 0;
}

[numthreads(640, 1, 1)]
void CSMain(uint3 id : SV_DispatchThreadID)
{
    float4x4 m = _AllMatricesBuffer[id.x];
    float4 boundPoints[8];
    boundPoints[0] = mul(m, float4(_BoundMin, 1));
    boundPoints[1] = mul(m, float4(_BoundMax, 1));
    boundPoints[2] = mul(m, float4(_BoundMax.x, _BoundMax.y, _BoundMin.z, 1));
    boundPoints[3] = mul(m, float4(_BoundMax.x, _BoundMin.y, _BoundMax.z, 1));
    boundPoints[4] = mul(m, float4(_BoundMax.x, _BoundMin.y, _BoundMin.z, 1));
    boundPoints[5] = mul(m, float4(_BoundMin.x, _BoundMax.y, _BoundMax.z, 1));
    boundPoints[6] = mul(m, float4(_BoundMin.x, _BoundMax.y, _BoundMin.z, 1));
    boundPoints[7] = mul(m, float4(_BoundMin.x, _BoundMin.y, _BoundMax.z, 1));
    
    for (int i = 0; i < 6; i++)
    {
        for (int j = 0; j < 8; j++)
        {
            float3 p = boundPoints[j].xyz;
            if (!IsOutsideThePlane(_FrustumPlanes[i], p))
                break;
            if (j == 7)
                return;
        }
    }
    
    _VisibleIDsBuffer.Append(id.x);
}

基本照抄大佬文章中的ComputeShader，不同的是，这里使用了一个_VisibleIDsBuffer，如果物体被判断为可见，则将实例ID追加到其中，在随后的渲染Shader中，同样会使用它获取可见的物体的实例ID。

ComputeShader中的变量值将在C#侧传入，CSMain函数也将在C#侧调用。

输入与调用

接下来在C#侧把ComputeShader所需的数据准备好并调用。复制上一篇中的ExampleClass.cs，改名为FrustumCullingRenderer.cs，加上所需要的字段：

FrustumCullingRenderer.cs

public class FrustumCullingRenderer : MonoBehaviour
{
    public int instanceCount = 100000;
    public Mesh instanceMesh;
    public Material instanceMaterial;
    public int subMeshIndex = 0;
    // 新增：物体包围盒最小点
    public Vector3 objectBoundMin;
    // 新增：物体包围盒最大点
    public Vector3 objectBoundMax;
    // 新增：ComputeShader
    public ComputeShader cullingComputeShader;
    
    int cachedInstanceCount = -1;
    int cachedSubMeshIndex = -1;
    // 新增：ComputeShader中内核函数索引
    int kernel = 0;
    // 修改：原positionBuffer改为物体的复合变换矩阵Buffer
    ComputeBuffer allMatricesBuffer;
    // 新增：当前可见物体的instanceID Buffer   
    ComputeBuffer visibleIDsBuffer;
    ComputeBuffer argsBuffer;
    uint[] args = new uint[5] { 0, 0, 0, 0, 0 };
    // 新增：相机的视锥平面
    Plane[] cameraFrustumPlanes = new Plane[6];
    // 新增：传入ComputeShader的视锥平面  
    Vector4[] frustumPlanes = new Vector4[6];
    ...

其中物体包围盒与ComputeShader在编辑器里设置，比如默认的Cube是1个单位大小，那么包围盒可以这样设置：

世界空间下的包围盒坐标的计算还需要用到物体的复合变换矩阵（平移矩阵、旋转矩阵、缩放矩阵），上一篇中的positionBuffer在这里升级成了allMatricesBuffer，在ComputeShader中，我们用这个矩阵将物体包围盒的顶点坐标由模型空间转换到世界空间下。

在ComputeShader中计算完剔除后，需要将所有可见物体的实例ID返回，这里使用visibleIDsBuffer来接收。

修改UpdateBuffers方法，给allMatricesBuffer与visibleIDsBuffer赋值：

void UpdateBuffers()
{
    // 不需要更新时返回
    ...
    // 规范subMeshIndex
    ...
    // 修改：物体位置 改为 物体复合变换矩阵
    allMatricesBuffer?.Release();
    allMatricesBuffer = new ComputeBuffer(instanceCount, sizeof(float) * 16);   // float4x4
    Matrix4x4[] trs = new Matrix4x4[instanceCount];
    for (int i = 0; i < instanceCount; i++)
    {
        // 随机位置
        float angle = Random.Range(0.0f, Mathf.PI * 2.0f);
        float distance = Random.Range(8.0f, 90.0f);
        float height = Random.Range(-5.0f, 5.0f);
        float size = Random.Range(0.05f, 1f);
        var position = new Vector4(Mathf.Sin(angle) * distance, height, Mathf.Cos(angle) * distance, size);
        trs[i] = Matrix4x4.TRS(position, Random.rotationUniform, new Vector3(size, size, size));
    }
    allMatricesBuffer.SetData(trs);
    instanceMaterial.SetBuffer("_AllTRSBuffer", allMatricesBuffer);
    ...

    // 新增： 可见实例 Buffer
    visibleIDsBuffer?.Release();
    visibleIDsBuffer = new ComputeBuffer(instanceCount, sizeof(uint), ComputeBufferType.Append);
    instanceMaterial.SetBuffer("_VisibleIDsBuffer", visibleIDsBuffer);

    // Indirect args
    ...

可以注意到，visibleIDsBuffer需要指定类型为ComputeBufferType.Append，表示在Shader中可以对它追加值，对应ComputeShader中的AppendStructuredBuffer类型，之后可见物体的实例ID将被追加到其中。

随后把上面提到的东西都扔到CoumputeShader里：

    ...
    // ComputeShader
    cullingComputeShader.SetVector("_BoundMin", objectBoundMin);
    cullingComputeShader.SetVector("_BoundMax", objectBoundMax);
    cullingComputeShader.SetBuffer(kernel, "_AllMatricesBuffer", allMatricesBuffer);
    cullingComputeShader.SetBuffer(kernel, "_VisibleIDsBuffer", visibleIDsBuffer);
    ...

至此UpdateBuffers方法就修改完了，ComputeShader中还差最后一样东西——视锥的六个面，通常情况下相机的位置是会频繁发生变化的，所以在Update中来获取它们。

这里直接用Unity提供的API获取：

void Update()
{
    // 更新Buffer
    UpdateBuffers();
    // 方向键改变绘制数量
    ...
    // 视锥剔除
    GeometryUtility.CalculateFrustumPlanes(Camera.main,cameraFrustumPlanes);
    for (int i = 0; i < cameraFrustumPlanes.Length; i++)
    {
        var normal = -cameraFrustumPlanes[i].normal;
        frustumPlanes[i] = new Vector4(normal.x, normal.y, normal.z, -cameraFrustumPlanes[i].distance);
    }
    ...

GeometryUtility.CalculateFrustumPlanes可以计算指定相机的视锥面，需要注意的是，这样获取到的六个面是正面朝内的，而ComputeShader中是按照正面朝外计算的。要么修改ComputeShader，要么将六个面反向一下，这里选择反向，即改变法线与距离的符号，最终传入ComputeShader的是Vector4类型，放在frustumPlanes中。

将视锥平面传入ComputeShader，可以调用计算了：

void Update()
{
    // 更新Buffer
    ...
    // 方向键改变绘制数量
    ...
    // 视锥剔除
    ...
    
    visibleIDsBuffer.SetCounterValue(0);
    cullingComputeShader.SetVectorArray("_FrustumPlanes", frustumPlanes);
    cullingComputeShader.Dispatch(kernel, Mathf.CeilToInt(instanceCount / 640f), 1, 1);
    ComputeBuffer.CopyCount(visibleIDsBuffer, argsBuffer, sizeof(uint));
    // 渲染
    ...
}

有三点需要注意：

调用ComputeShader前，必须要使用visibleIDsBuffer.SetCounterValue(0)将计数器置为0，因为在ComputeShader中会不断将可见物体的实例ID追加到visibleIDsBuffer，如果不置为0，那电脑很可能就爆炸了（惨痛的教训）。
必须要使用ComputeBuffer.CopyCount将visibleIDsBuffer的长度写入到argsBuffer里，因为最终渲染用的还是argsBuffer。
如果ComputeShader同时在多个地方使用，比如渲染花花草草，那么需要用Instantiate方法将其分别实例化，每种物体的渲染各用一个实例。

至此C#部分修改完毕，最后修改渲染用的Shader。

Shader

剔除相关的修改已经完成，Shader中没有多少要改的，只需要将_AllMatricesBuffer和_VisibleIDsBuffer加上并使用就行。复制上一篇的Shader，改名为InstancedCulling.shader，修改HLSLINCLUDE中的Buffer变量：

HLSLINCLUDE
...
CBUFFER_START(UnityPerMaterial)
...
// 修改：所有物体的复合变换矩阵
StructuredBuffer<float4x4> _AllMatricesBuffer;
// 新增：可见物体实例ID
StructuredBuffer<uint> _VisibleIDsBuffer;
CBUFFER_END
...
ENDHLSL

在顶点函数中使用：

Varyings Vertex(Attributes IN, uint instanceID : SV_InstanceID)
{
    Varyings OUT;
    // 修改：顶点坐标转换到世界空间
    #if SHADER_TARGET >= 45
    // float4 data = positionBuffer[instanceID];
    float4x4 data = _AllMatricesBuffer[_VisibleIDsBuffer[instanceID]];
    #else
    float4x4 data = 0;
    #endif
    // float3 positionWS = mul(mul(unity_ObjectToWorld, data), IN.positionOS).xyz;
    float3 positionWS = mul(data, IN.positionOS).xyz;
    OUT.positionWS = positionWS;
    OUT.positionCS = mul(unity_MatrixVP, float4(positionWS, 1.0));
    OUT.uv = TRANSFORM_TEX(IN.texcoord, _BaseMap);

    // 修改：法线转换到世界空间
    // float3 normalWS = TransformObjectToWorldNormal(normalize(mul(data, IN.normalOS)));
    float3 normalWS = normalize(mul(data, float4(IN.normalOS, 0))).xyz;
    float fogFactor = ComputeFogFactor(OUT.positionCS.z);
    OUT.normalWSAndFogFactor = float4(normalWS, fogFactor);
    return OUT;
}

可以对比注释掉的部分，我们现在通过_VisibleIDsBuffer[instanceID]拿到剔除后的实例ID，再通过实例ID在_AllMatricesBuffer获取到物体的复合变换矩阵，用它将顶点坐标从模型空间转换到世界空间。

由于加入了物体旋转，世界空间下的法线也需要用这个矩阵转换，需要注意的是这里的写法仅适用于统一缩放，即x、y、z的缩放都相同的情况，如果是非统一缩放，使用变换矩阵*法线会得到错误的结果：

图片来源：《Unity Shader入门精要》

这种情况下需要求得变换矩阵的逆矩阵，使用法线*逆矩阵得到正确结果，代码网上有很多，这里就不贴了。

两个Pass都这样修改一下，运行效果：

另一种剔除方法

来实现开头说的另一种剔除方法，通过对裁剪空间下的包围盒顶点齐次坐标进行判断来剔除。在ComputeShader中，只要将包围盒的八个点坐标转换到裁剪空间下再进行判断就行了，为了将坐标转换到裁剪空间，我们需要向ComputeShader传入当前的VP（观察投影）矩阵。

FrustumCulling.compute

#pragma kernel CSMain

// float4 _FrustumPlanes[6];
float4x4 _MatrixVP; // 修改：观察投影矩阵
float3 _BoundMin;   // 物体包围盒最小点
float3 _BoundMax;   // 物体包围盒最大点
StructuredBuffer<float4x4> _AllMatricesBuffer;   // 所有物体的复合变换矩阵
AppendStructuredBuffer<uint> _VisibleIDsBuffer;  // 可见物体实例ID

bool IsInClipSpace(float4 coord)
{
    return -coord.w <= coord.x && coord.x <= coord.w
        && -coord.w <= coord.y && coord.y <= coord.w
        && -coord.w <= coord.z && coord.z <= coord.w;
}

[numthreads(640, 1, 1)]
void CSMain(uint3 id : SV_DispatchThreadID)
{
    float4x4 mvp = mul(_MatrixVP, _AllMatricesBuffer[id.x]);
    float4 boundPoints[8];
    boundPoints[0] = mul(mvp, float4(_BoundMin, 1));
    boundPoints[1] = mul(mvp, float4(_BoundMax, 1));
    boundPoints[2] = mul(mvp, float4(_BoundMax.x, _BoundMax.y, _BoundMin.z, 1));
    boundPoints[3] = mul(mvp, float4(_BoundMax.x, _BoundMin.y, _BoundMax.z, 1));
    boundPoints[4] = mul(mvp, float4(_BoundMax.x, _BoundMin.y, _BoundMin.z, 1));
    boundPoints[5] = mul(mvp, float4(_BoundMin.x, _BoundMax.y, _BoundMax.z, 1));
    boundPoints[6] = mul(mvp, float4(_BoundMin.x, _BoundMax.y, _BoundMin.z, 1));
    boundPoints[7] = mul(mvp, float4(_BoundMin.x, _BoundMin.y, _BoundMax.z, 1));
    
    bool isIn = false;
    for (int i = 0; i < 8; i++)
    {
        if (IsInClipSpace(boundPoints[i]))
        {
            isIn = true;
            break;
        }
    }

    if (isIn)
        _VisibleIDsBuffer.Append(id.x);
}

得到MVP矩阵，将八个顶点转换到裁剪空间，如果有一个顶点在裁剪空间内，则视物体为可见。IsInClipSpace中是OpenGL的判断方式（-w ~ w），如果是DirectX则要判断是否在0 ~ w之间。

在C#中获取VP矩阵并传入ComputeShader即可:

FrustumCullingRenderer.cs

void Update()
{
    // 更新Buffer
    ...
    // 方向键改变绘制数量
    ...
    // 视锥剔除
    // 修改：计算观察投影矩阵
    var matrixVP = Camera.main.projectionMatrix * Camera.main.worldToCameraMatrix;
    visibleIDsBuffer.SetCounterValue(0);
    // cullingComputeShader.SetVectorArray("_FrustumPlanes", frustumPlanes);
    cullingComputeShader.SetMatrix("_MatrixVP", matrixVP);
    cullingComputeShader.Dispatch(kernel, Mathf.CeilToInt(instanceCount / 640f), 1, 1);
    ComputeBuffer.CopyCount(visibleIDsBuffer, argsBuffer, sizeof(uint));
    // 渲染
    Bounds renderBounds = new Bounds(Vector3.zero, new Vector3(200.0f, 200.0f, 200.0f));
    Graphics.DrawMeshInstancedIndirect(instanceMesh, subMeshIndex, instanceMaterial, renderBounds, argsBuffer);
}

这里让相机的两个矩阵相乘得出VP矩阵，更规范一点的做法应该是这样：

var matrixVP = GL.GetGPUProjectionMatrix(Camera.main.projectionMatrix, false) * Camera.main.worldToCameraMatrix;

这么做的原因在官方文档中有提到：

Note that projection matrix passed to shaders can be modified depending on platform and other state. If you need to calculate projection matrix for shader use from camera's projection, use GL.GetGPUProjectionMatrix.

In Unity, projection matrices follow OpenGL convention. However on some platforms they have to be transformed a bit to match the native API requirements. Use this function to calculate how the final projection matrix will be like. The value will match what comes as UNITY_MATRIX_P matrix in a shader.

大概是说Unity中投影矩阵遵循OpenGL传统，但实际的运行平台不一定是OpenGL，要获得与平台匹配的投影矩阵，需要使用GL.GetGPUProjectionMatrix。
我这里选择不用它，因为用了的话，ComputeShader中就要针对各API情况分别判断了，个人认为这一步不是很有必要，但还没实际测试过，如有错误欢迎指出。

得到的效果与第一种方法一致。

Demo代码地址

至此视锥剔除基本完成了，当然有些细节还没有优化到位，下一篇先来做遮挡剔除。

[Unity]大批量物体渲染学习笔记（二）

ComputeShader

输入与调用

Shader

另一种剔除方法

猜你喜欢

热点阅读