缺氧的瓦片渲染的特点

很可惜我没有在RenderDoc里截到缺氧的帧,不过我还是能从渲染表现上来分析一下缺氧的瓦片渲染的特点。经过一段时间的游玩和从下面这张图中可以看到,缺氧的游戏逻辑是把整个2D的地图分成一个一个格子,每个格子记录了气体、液体、固体和建筑物的信息。气体只是一个扭曲的Shader,液体渲染和计算比较复杂,这里暂时不考虑,建筑物中的墙和管线虽然也有程序化生成再渲染的效果,但和场景中资源类型的固体格子是硬相接的关系,这里也不考虑。本文的研究重点放在资源类型的固体格子的渲染上(不包括这些格子的程序化生成)。

Oxygen Not Included

资源类型的固体格子(这里就简称瓦片了)的特点如下:

  1. 有多种类型的瓦片
  2. 仅在不同类型的瓦片相接时会有黑色的描边
  3. 瓦片之间会有排序,优先级高的瓦片会更多地扩张
  4. 瓦片之间黑色的描边呈现周期性规律

模仿这种渲染的思路

最简单的思路肯定就是在CPU中计算每一个瓦片应当有的形态,然后找到对应的贴图,把瓦片在GPU中绘制出来了。但是这样子做的话就失去了本文的意义,也太过无趣了。我想的是尽量多地用GPU来计算每个瓦片的形态,同时使用Instancing的方式,绘制每一个瓦片。

第一个问题是,不规则的瓦片应当如何绘制。如果是正方形的瓦片,能够很轻易地使用一个Quad和纹理来绘制,但是不规则的瓦片,势必会使用透明度混合的方式来绘制,这时对应的模型就会超出瓦片的游戏逻辑上的位置。因此,我想的是绘制的Quad的数量是瓦片实际数量的两倍加一,如下图所示:

Tile Rendering Diagram

在这张图中,ABC代表了不同类型的瓦片,左边是游戏游玩的时候逻辑上的瓦片分布,ABC是相接的,右边是在渲染的时候的瓦片的分布,在原有瓦片中间插入新的瓦片,专门用来渲染接缝。对于2号瓦片,其左上角右上角右下角左下角(顺时针的顺序)分别是ABCC,决定了这是一块三块相接的瓦片;对于1号瓦片,对应的编号是AACC(通过一些对2取模取余的运算可以排除掉B),决定了这是一块两块相接的瓦片;而对于3和4号瓦片,其编号为CCCC,决定了这两块是没有接缝的瓦片。这时我们又考虑到了瓦片之间优先级的关系,假设C>B>A,则AACC和AABB的接缝应当是相同的,ABCC和BCCA是旋转了九十度的关系。考虑到必定会有一个瓦片处于最低优先级,我们只需要将最低优先级的瓦片固定到左上角,讨论剩下三个瓦片的优先级与顺序即可。循着这个思路,我们可以把所有可能的接缝画在一张图上,这张图的RGBA通道记录了瓦片的优先级(R优先级最低,A优先级最高,接缝我使用了一个统一的灰色以便后续渲染),图片如下所示,为了比较容易观察,我对A通道做了反向,且对应的在下方标注了优先级顺序。同时我们还对应的写好一个函数用于根据优先级顺序找到对应的接缝类型从而在渲染时找到接缝在图上的位置(见ONITileRender.hlsl中的GetMode(uint a, uint b, uint c))。

由于会有优先级的比较,不可避免地会在GPU中进行排序,使用MergeSort的话,4个元素会有5次比较,由于我们还需要获得每个瓦片在四个瓦片中排序的序号,这里就硬写了手动比较,6次比较和MergeSort的5次也差不太多。我们绘制的图上仅有最低优先级瓦片在左上角的情况,因此我们还需要找到最低优先级瓦片初始的序号,从而在渲染时旋转我们的接缝图(这里就体现了我们使用顺时针编号的优势,方便了旋转的操作,如果是左上角右上角左下角右下角的顺序,就不太好旋转了)。

知道了每一个接缝图的旋转,我们还需要为其每一个部分(通道)渲染不同的贴图。这里使用了DrawProceduralIndirect来进行Instancing的渲染,DrawCall数量会和瓦片类型的数量一样多。对于一种瓦片,需要渲染的总瓦片数相当于是这类瓦片的图形向外扩展一个瓦片的数量,我们可以通过判断左上右上右下左下的瓦片类型来轻易地判断当前瓦片是否应该和目标瓦片类型一起渲染。我们会使用一个数据数量为瓦片类型数量*(2*地图宽高+1)的StructuredBuffer来统计所有应当绘制的瓦片(实际使用的大小不会大于4*(2*地图宽+1)*(2*地图高+1))。同时我们会使用一个数据数量为瓦片类型数量*5的ByteAddressBuffer来统计每种瓦片类型Instancing时需要的参数。

本文中的岩石的2D无缝贴图来自OpenGameArt.org

具体的代码和相关的解释

由于会用到CommandBuffer进行瓦片的绘制,我就把相关的代码放到Universal RP的Package里了。CPU代码,ONITileRenderManager.cs放在Packages/com.unity.render-pipelines.universal/Runtime/Overrides/下,ONITileRendererFeature.cs放在Packages/com.unity.render-pipelines.universal/Runtime/RendererFeature/下,ONITileRenderPass.cs放在Packages/com.unity.render-pipelines.universal/Runtime/Passes/下;GPU代码,ONITileRender.hlslONITileComputeShader.computeONITileRenderShader.shader放在Packages/com.unity.render-pipelines.universal/Shaders/ONITile/下。

ONITileRenderManager用于地图的设置、计算和Buffer的获取。ONITileRendererFeatureONITileRenderPass用于在Unity URP中渲染瓦片,ONITileComputeShader用于瓦片相关的计算,ONITileRenderShader用于瓦片的渲染。

ONITileRenderManager.cs

这里尤其需要注意每个Buffer的大小。在这个脚本里使用Compute Shader做了三件事:1. 对地图每一个点生成一个随机数作为瓦片类型;2. 从地图中计算每一种瓦片类型需要绘制的数量、位置、解封类型、旋转和应当采样的通道;3. 把ByteAddressBuffer中的数据复制到IndirectArgumentBuffer里。事实上我感觉ComputeShader.Dispatch应该做成一个异步的方法,不过这个调用频率不高,就这样好了。

using UnityEngine;

[ExecuteInEditMode]
public class ONITileRenderManager : MonoBehaviour
{
    [HideInInspector]
    public static ONITileRenderManager Instance { get; private set; }

    public ComputeShader oniTileComputeShader;
    public int tileTypeCount = 4;
    public Vector2Int tileCount = new Vector2Int(16, 16);
    public Vector2 tileSize = Vector2.one;
    public Vector3 tileStartPos;
    public Vector2 randomSeed;
    public Texture[] mainTextures = new Texture[] {};
    public Vector4 mainTextureST = new Vector4(1.0f, 1.0f, 0.0f, 0.0f);

    private Vector2Int tileCountExt;
    public Vector2Int TileCountExt { get { return tileCountExt; } }
    private Vector4 textureSize;
    public Vector4 TextureSize { get { return textureSize; } }
    private Vector4 textureSizeExt;
    public Vector4 TextureSizeExt { get { return textureSizeExt; } }

    private ComputeBuffer computeBuffer;
    public ComputeBuffer ComputeBuffer { get { return computeBuffer; } }
    private ComputeBuffer argBuffer;
    public ComputeBuffer ArgBuffer { get { return argBuffer; } }
    private ComputeBuffer counterBuffer;
    public ComputeBuffer CounterBuffer { get { return counterBuffer; } }
    private RenderTexture tileRenderTexture;
    private RenderTexture tileRenderTextureExt;

    private bool hasValidBuffer = false;
    public bool HasValidBuffer { get { return hasValidBuffer; } }

    struct PerTileProperty
    {
        public Vector2Int coord;
        public uint mode;
        public uint rotation;
        public uint channel;
    }

    private void EnsureRenderTexture(ref RenderTexture rt, int width, int height)
    {
        if (rt == null || rt.width != width || rt.height != height)
        {
            if (rt != null) RenderTexture.ReleaseTemporary(rt);

            RenderTextureDescriptor desc = new RenderTextureDescriptor(width, height, RenderTextureFormat.ARGBInt);
            desc.enableRandomWrite = true;
            desc.msaaSamples = 1;
            desc.depthBufferBits = 0;
            rt = RenderTexture.GetTemporary(desc);
            if (!rt.IsCreated()) rt.Create();
        }
    }

    private void EnsureComputeBuffer(ref ComputeBuffer cb, int count, int stride, ComputeBufferType cbt = ComputeBufferType.Append)
    {
        if (cb == null || cb.count != count || cb.stride != stride)
        {
            if (cb != null) cb.Release();

            cb = new ComputeBuffer(count, stride, cbt);
        }
    }


    private void OnEnable()
    {
        if (Instance != null)
        {
            enabled = false;
            Debug.LogError("An instance of ONITileRenderManager already exists.");
        }
        else
        {
            Instance = this;
        }
    }

    private void OnDisable()
    {
        Instance = null;
    }

    private void OnValidate()
    {
        tileTypeCount = Mathf.Max(1, tileTypeCount);
        tileCount.x = Mathf.Max(1, tileCount.x);
        tileCount.y = Mathf.Max(1, tileCount.y);
        tileCountExt = new Vector2Int(tileCount.x * 2 + 1, tileCount.y * 2 + 1);
        textureSize = new Vector4(tileCount.x, tileCount.y, 1.0f / tileCount.x, 1.0f / tileCount.y);
        textureSizeExt = new Vector4(tileCountExt.x, tileCountExt.y, 1.0f / tileCountExt.x, 1.0f / tileCountExt.y);

        EnsureComputeBuffer(ref computeBuffer, tileTypeCount * tileCountExt.x * tileCountExt.y, System.Runtime.InteropServices.Marshal.SizeOf<PerTileProperty>());
        EnsureComputeBuffer(ref argBuffer, tileTypeCount * 5, 4, ComputeBufferType.IndirectArguments);
        EnsureComputeBuffer(ref counterBuffer, tileTypeCount * 5, 4, ComputeBufferType.Raw);
        EnsureRenderTexture(ref tileRenderTexture, tileCount.x, tileCount.y);
        EnsureRenderTexture(ref tileRenderTextureExt, tileCountExt.x, tileCountExt.y);

        GenerateRandomTiles();
        hasValidBuffer = false;
        ExpandTileTexture();
        CopyToArgBuffer();
        hasValidBuffer = true;
    }

    private void GenerateRandomTiles()
    {
        if (!oniTileComputeShader) return;

        int kernelID = oniTileComputeShader.FindKernel(ONITileShaderConstants.S_RANDOM_KERNEL_NAME);
        oniTileComputeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
        oniTileComputeShader.SetVector(ONITileShaderConstants.I_RandomSeed, randomSeed);
        oniTileComputeShader.SetInt(ONITileShaderConstants.I_TileTypeCount, tileTypeCount);
        oniTileComputeShader.SetVector(ONITileShaderConstants.I_TextureSize, textureSize);
        oniTileComputeShader.SetTexture(kernelID, ONITileShaderConstants.I_RW_RandomTiles, tileRenderTexture);
        oniTileComputeShader.Dispatch(kernelID,
                Mathf.CeilToInt((float)tileCount.x / x),
                Mathf.CeilToInt((float)tileCount.y / y),
                1);
    }

    private void ExpandTileTexture()
    {
        if (!oniTileComputeShader) return;

        int[] data = new int[counterBuffer.count];
        for (int i = 0; i < data.Length; i++)
        {
            if (i % 5 == 0)
            {
                data[i] = 6;
            }
            else
            {
                data[i] = 0;
            }
        }
        counterBuffer.SetData(data);

        int kernelID = oniTileComputeShader.FindKernel(ONITileShaderConstants.S_EXPAND_KERNEL_NAME);
        oniTileComputeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
        oniTileComputeShader.SetVector(ONITileShaderConstants.I_TextureSize, textureSize);
        oniTileComputeShader.SetVector(ONITileShaderConstants.I_TextureSizeExt, textureSizeExt);
        oniTileComputeShader.SetTexture(kernelID, ONITileShaderConstants.I_RandomTiles, tileRenderTexture);
        oniTileComputeShader.SetTexture(kernelID, ONITileShaderConstants.I_RW_RandomTilesExt, tileRenderTextureExt);
        oniTileComputeShader.SetBuffer(kernelID, ONITileShaderConstants.I_RW_TileData, computeBuffer);
        oniTileComputeShader.SetBuffer(kernelID, ONITileShaderConstants.I_RW_CounterBuffer, counterBuffer);
        oniTileComputeShader.Dispatch(kernelID,
                Mathf.CeilToInt((float)tileCountExt.x / x),
                Mathf.CeilToInt((float)tileCountExt.y / y),
                1);
    }

    private void CopyToArgBuffer()
    {
        if (!oniTileComputeShader) return;

        int kernelID = oniTileComputeShader.FindKernel(ONITileShaderConstants.S_COPY_KERNEL_NAME);
        oniTileComputeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
        oniTileComputeShader.SetInt(ONITileShaderConstants.I_TileTypeCount, tileTypeCount);
        oniTileComputeShader.SetBuffer(kernelID, ONITileShaderConstants.I_CounterBuffer, counterBuffer);
        oniTileComputeShader.SetBuffer(kernelID, ONITileShaderConstants.I_RW_ArgBuffer, ArgBuffer);
        oniTileComputeShader.Dispatch(kernelID,
                Mathf.CeilToInt((float)(5*tileTypeCount) / x),
                1,
                1);
    }

    public Bounds GetBounds()
    {
        Vector3 start = tileStartPos;
        Vector2 totalSize = tileSize * tileCountExt;
        Vector3 size3D = new Vector3(totalSize.x, 0.0f, totalSize.y) + Vector3.one;
        Vector3 center = start + 0.5f * size3D;
        return new Bounds(center, size3D);
    }

    private void OnDestroy()
    {
        if (computeBuffer != null)
        {
            computeBuffer.Release();
            computeBuffer = null;
        }

        if (argBuffer != null)
        {
            argBuffer.Release();
            argBuffer = null;
        }

        if (tileRenderTexture != null)
        {
            RenderTexture.ReleaseTemporary(tileRenderTexture);
            tileRenderTexture = null;
        }

        if (tileRenderTextureExt != null)
        {
            RenderTexture.ReleaseTemporary(tileRenderTextureExt);
            tileRenderTextureExt = null;
        }
    }

    public class ONITileShaderConstants
    {
        public static readonly string S_RANDOM_KERNEL_NAME = "RandomMain";
        public static readonly string S_EXPAND_KERNEL_NAME = "ExpandMain";
        public static readonly string S_COPY_KERNEL_NAME = "CopyMain";

        public static readonly int I_TextureSize = Shader.PropertyToID("_TextureSize");
        public static readonly int I_TextureSizeExt = Shader.PropertyToID("_TextureSizeExt");
        public static readonly int I_TileTypeCount = Shader.PropertyToID("_TileTypeCount");

        public static readonly int I_RandomSeed = Shader.PropertyToID("_RandomSeed");
        public static readonly int I_RandomTiles = Shader.PropertyToID("_RandomTiles");
        public static readonly int I_RW_RandomTiles = Shader.PropertyToID("_RW_RandomTiles");
        public static readonly int I_RW_RandomTilesExt = Shader.PropertyToID("_RW_RandomTilesExt");
        public static readonly int I_RW_TileData = Shader.PropertyToID("_RW_TileData");
        public static readonly int I_RW_ArgBuffer = Shader.PropertyToID("_RW_ArgBuffer");
        public static readonly int I_CounterBuffer = Shader.PropertyToID("_CounterBuffer");
        public static readonly int I_RW_CounterBuffer = Shader.PropertyToID("_RW_CounterBuffer");

        public static readonly int I_TileTexture = Shader.PropertyToID("_TileTexture");
        public static readonly int I_TileData = Shader.PropertyToID("_TileData");
        public static readonly int I_TileSize = Shader.PropertyToID("_TileSize");
        public static readonly int I_TileStartPos = Shader.PropertyToID("_TileStartPos");

        public static readonly int I_TileType = Shader.PropertyToID("_TileType");
        public static readonly int I_MainTexture = Shader.PropertyToID("_MainTexture");
        public static readonly int I_MainTexture_ST = Shader.PropertyToID("_MainTexture_ST");
    }
}

ONITileRendererFeature.cs

没什么好说的,照本宣科罢了。

namespace UnityEngine.Rendering.Universal
{
    public class ONITileRendererFeature : ScriptableRendererFeature
    {
        [System.Serializable]
        public class ONITileSettings
        {
            public RenderPassEvent renderPassEvent = RenderPassEvent.BeforeRenderingTransparents;
            public Shader drawShader;
            public Texture tileTexture;

            public bool IsValid()
            {
                return drawShader != null && tileTexture != null;
            }
        }

        private ONITileRenderPass oniTileRenderPass;
        public ONITileSettings oniTileSettings = new ONITileSettings();

        public override void Create()
        {
            oniTileRenderPass = new ONITileRenderPass(oniTileSettings);
        }

        public override void AddRenderPasses(ScriptableRenderer renderer, ref RenderingData renderingData)
        {
            if (ONITileRenderManager.Instance != null
                && oniTileSettings.IsValid())
            {
                oniTileRenderPass.Setup(ONITileRenderManager.Instance);
                renderer.EnqueuePass(oniTileRenderPass);
            }
        }
    }
}

ONITileRenderPass.cs

DrawProceduralIndirect需要的argument一共五个uint的数据,分别为每个实例的顶点数,实例数,顶点起始位置,实例起始位置和一个预留给OpenGL的空位。如果我们需要画十三个Quad,我们只需要传入{6, 13, 0, 0, 0}即可。i * 5 * 4意味着对第i个瓦片类型,我们需要读取i*5+0i*5+4这五个uint的数据作为IndirectDraw的argument,而uint的大小为4。

namespace UnityEngine.Rendering.Universal
{
    public class ONITileRenderPass : ScriptableRenderPass
    {
        private const string profilerTag = "ONI Tile Render Pass";
        private ProfilingSampler oniTileRenderSampler = new ProfilingSampler(profilerTag);
        private ONITileRendererFeature.ONITileSettings settings;
        private ONITileRenderManager oniTileRenderManager;
        private Material drawMaterial;

        public ONITileRenderPass(ONITileRendererFeature.ONITileSettings settings)
        {
            this.settings = settings;
            renderPassEvent = settings.renderPassEvent;
            if (settings.drawShader != null)
            {
                drawMaterial = new Material(settings.drawShader);
            }
        }

        public void Setup(ONITileRenderManager oniTileRenderManager)
        {
            this.oniTileRenderManager = oniTileRenderManager;
        }

        private void DoONITileRendering(CommandBuffer cmd, Material material)
        {
            if(oniTileRenderManager.HasValidBuffer)
            {
                MaterialPropertyBlock mpb = new MaterialPropertyBlock();
                mpb.SetBuffer(ONITileRenderManager.ONITileShaderConstants.I_TileData, oniTileRenderManager.ComputeBuffer);
                mpb.SetVector(ONITileRenderManager.ONITileShaderConstants.I_TextureSizeExt, oniTileRenderManager.TextureSizeExt);
                mpb.SetVector(ONITileRenderManager.ONITileShaderConstants.I_TileStartPos, oniTileRenderManager.tileStartPos);
                mpb.SetVector(ONITileRenderManager.ONITileShaderConstants.I_TileSize, oniTileRenderManager.tileSize);
                mpb.SetTexture(ONITileRenderManager.ONITileShaderConstants.I_TileTexture, settings.tileTexture);
                mpb.SetVector(ONITileRenderManager.ONITileShaderConstants.I_MainTexture_ST, oniTileRenderManager.mainTextureST);

                int mainTextureLength = oniTileRenderManager.mainTextures.Length;
                for (int i = 0; i < oniTileRenderManager.tileTypeCount; i++)
                {
                    if(mainTextureLength > 0)
                    {
                        int mainTextureIndex = i % mainTextureLength;
                        mpb.SetTexture(ONITileRenderManager.ONITileShaderConstants.I_MainTexture, oniTileRenderManager.mainTextures[mainTextureIndex]);
                    }

                    mpb.SetInt(ONITileRenderManager.ONITileShaderConstants.I_TileType, i);
                    cmd.DrawProceduralIndirect(Matrix4x4.identity, material, 0, MeshTopology.Triangles,
                                                oniTileRenderManager.ArgBuffer, i * 5 * 4, properties:mpb);
                }   
            }
        }

        public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
        {
            CommandBuffer cmd = CommandBufferPool.Get(profilerTag);
            context.ExecuteCommandBuffer(cmd);
            cmd.Clear();

            using (new ProfilingScope(cmd, oniTileRenderSampler))
            {
                DoONITileRendering(cmd, drawMaterial);
            }

            context.ExecuteCommandBuffer(cmd);
            cmd.Clear();
            CommandBufferPool.Release(cmd);
        }
    }
}

ONITileRender.hlsl

这里定义了一个结构体PerTileProperty,需要和CPU代码里结构体的数据布局保持一致。事实上,channel和rotation都只是0-3的int类型,占两个bit,可以合并在一起,这样一个PerTileProperty刚好是四个字节,这里就不这么做了。至于中间的函数,我真的不太擅长命名。SortAndReturnIndex在排序的同时,还返回了每个元素在排序后的序号,方便后面的处理。ProcessSortedArray用于处理排序,生成并列第二名第三名。RotateAccordingToMinimum是旋转处理好的排序,从而通过GetMode获取接缝类型。我尽可能地减少了GetMode的分支数量(其实是一种二进制+三进制+特例)。

#ifndef ONI_TILE_RENDER_HLSL
#define ONI_TILE_RENDER_HLSL

#define ONI_TILE_TEXTURE_SIZE 8u

struct PerTileProperty
{
    uint2 coord;
    uint mode;
    uint rotation;
    uint channel;
};

SamplerState sampler_LinearClamp;
SamplerState sampler_PointClamp;
uint _TileTypeCount;
float4 _TextureSize;
float4 _TextureSizeExt;
float3 _TileStartPos;
float2 _TileSize;

// https://www.shadertoy.com/view/4djSRW
float hash12(float2 p)
{
    float3 p3  = frac(float3(p.xyx) * .1031);
    p3 += dot(p3, p3.yzx + 33.33);
    return frac((p3.x + p3.y) * p3.z);
}

// A sort4 function with 6 compares.
// output: sorted array of input;
// return: sorted index of elements from input;
uint4 SortAndReturnIndex(float4 input, out uint output[4])
{  
    uint3 ab_ac_ad = input.xxx <= input.yzw ? 0 : 1;
    uint3 bc_bd_cd = input.yyz <= input.zww ? 0 : 1;
    uint indexA = ab_ac_ad.x + ab_ac_ad.y + ab_ac_ad.z;
    uint indexB = 1 - ab_ac_ad.x + bc_bd_cd.x + bc_bd_cd.y;
    uint indexC = 2 - ab_ac_ad.y - bc_bd_cd.x + bc_bd_cd.z;
    uint indexD = 3 - ab_ac_ad.z - bc_bd_cd.y - bc_bd_cd.z;
    
    output[indexA] = 0;
    output[indexB] = 1;
    output[indexC] = 2;
    output[indexD] = 3;

    return uint4(indexA, indexB, indexC, indexD);
}

// Input numbers might have equal elements, adjust sorted index based on that.
void ProcessSortedArray(float4 input, uint4 output, inout uint processedArray[4])
{
    if(input[output.x] == input[output.y])
    {
        processedArray[output.y] -= 1;
        processedArray[output.z] -= 1;
        processedArray[output.w] -= 1;
    }
    if(input[output.y] == input[output.z])
    {
        processedArray[output.z] -= 1;
        processedArray[output.w] -= 1;
    }
    if(input[output.z] == input[output.w])
    {
        processedArray[output.w] -= 1;
    }
}

// Rotate processed array according to the index of minimum element.
uint4 RotateAccordingToMinimum(uint processedArray[4], uint sortedArray[4])
{
    uint minIndex = sortedArray[0];
    uint rotateX = processedArray[(0+minIndex)%4];
    uint rotateY = processedArray[(1+minIndex)%4];
    uint rotateZ = processedArray[(2+minIndex)%4];
    uint rotateW = processedArray[(3+minIndex)%4];
    return uint4(rotateX, rotateY, rotateZ, rotateW);
}

// Get mode based on "sorted-processed-rotated" result, mode is used to sample tile texture later.
// 0 a
// c b
uint GetMode(uint a, uint b, uint c)
{
    if(a==0)
    {
        if(b==0) return c==0 ? 0 : 1;
        if(b==1) return c+2;
        if(b==2) return 5;
    }

    if(a==1)
    {
        return b==3 ? 16 : c+b*3+6;
    }

    if(a==2)
    {
        if(b==0) return 17;
        if(b==1) return c+18;
        if(b==2) return 22;
        if(b==3) return 23;
    }

    if(a==3)
    {
        return b==1 ? 24 : 25;
    }

    return 0;
}

#endif

ONITileComputeShader.compute

这里定义了三个Kernel,分别用于生成随机数,找到所有应渲染的瓦片及其属性,把ByteAddressBuffer的内容复制到IndirectArgumentBuffer。实际运用的话我们不会在GPU里生成瓦片,理论上也没必要做一次Buffer数据的复制。使用了ByteAddressBuffer.InterlockedAdd来获取当前类型的瓦片需要Instancing的数量(并根据这个数量储存对应的PerTileProperty)。

#pragma kernel RandomMain
#pragma kernel ExpandMain
#pragma kernel CopyMain

#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
#include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
#include "ONITileRender.hlsl"

float2 _RandomSeed;

Texture2D<float4> _RandomTiles;
RWTexture2D<float4> _RW_RandomTiles;
RWTexture2D<float4> _RW_RandomTilesExt;

ByteAddressBuffer _CounterBuffer;
RWStructuredBuffer<PerTileProperty> _RW_TileData;
RWStructuredBuffer<uint> _RW_ArgBuffer;
RWByteAddressBuffer _RW_CounterBuffer;

[numthreads(8,8,1)]
void RandomMain (uint3 dispatchThreadID : SV_DispatchThreadID)
{
    if(any((float2)dispatchThreadID.xy >= _TextureSize.xy)) return;

    float randomVal = hash12(dispatchThreadID.xy + _RandomSeed);
    float val = floor(randomVal * _TileTypeCount);

    float4 returnColor = float4(val, val, val, val);
    _RW_RandomTiles[dispatchThreadID.xy] = returnColor;
}

float LoadFromRandomTiles(Texture2D<float4> tex, int2 coord, float2 textureSize)
{
    float2 tempCoord = clamp(float2(0.0f, 0.0f), textureSize-1.0f, coord);
    return tex.Load(uint3(tempCoord, 0)).r;
}

[numthreads(8, 8, 1)]
void ExpandMain (uint3 dispatchThreadID : SV_DispatchThreadID)
{
    if(any((float2)dispatchThreadID.xy >= _TextureSizeExt.xy)) return;

    float tr = LoadFromRandomTiles(_RandomTiles, dispatchThreadID.xy / 2, _TextureSize.xy);
    float tl = LoadFromRandomTiles(_RandomTiles, (int2)((dispatchThreadID.xy+uint2(1, 0)) / 2) - int2(1, 0), _TextureSize.xy);
    float br = LoadFromRandomTiles(_RandomTiles, (int2)((dispatchThreadID.xy+uint2(0, 1)) / 2) - int2(0, 1), _TextureSize.xy);
    float bl = LoadFromRandomTiles(_RandomTiles, (int2)((dispatchThreadID.xy+uint2(1, 1)) / 2) - int2(1, 1), _TextureSize.xy);
    
    // x y
    // w z
    float4 packedColor = float4(tl, tr, br, bl);
    uint output[4];
    uint4 sortedIndex = SortAndReturnIndex(packedColor, output);
    uint4 tempOutput = int4(output[0], output[1], output[2], output[3]);

    uint processedArray[4] = {sortedIndex.x, sortedIndex.y, sortedIndex.z, sortedIndex.w};
    ProcessSortedArray(packedColor, tempOutput, processedArray);
    uint4 rotatedIndex = RotateAccordingToMinimum(processedArray, output);
    uint mode = GetMode(rotatedIndex.y, rotatedIndex.z, rotatedIndex.w);

    for (float i = 0; i < (float)_TileTypeCount; i++)
    {
        bool shouldRender = false;
        int channel = 3;
        if(bl == i)
        {
            shouldRender = true;
            channel = processedArray[3];
        }

        if(br == i)
        {
            shouldRender = true;
            channel = processedArray[2];
        }

        if(tr == i)
        {
            shouldRender = true;
            channel = processedArray[1];
        }

        if(tl == i)
        {
            shouldRender = true;
            channel = processedArray[0];
        }

        if(shouldRender)
        {
            uint totalCount;
            _RW_CounterBuffer.InterlockedAdd(4 + i * 5 * 4, 1, totalCount);
            PerTileProperty prop = (PerTileProperty)0;
            prop.coord = dispatchThreadID.xy;
            prop.mode = mode;
            prop.rotation = tempOutput.x;
            prop.channel = channel;
            _RW_TileData[totalCount + i * _TextureSizeExt.x * _TextureSizeExt.y] = prop;
        }
    }

    _RW_RandomTilesExt[dispatchThreadID.xy] = float4(mode, tempOutput.x, 0.0f, 1.0f);
}

[numthreads(16, 1, 1)]
void CopyMain (uint3 dispatchThreadID : SV_DispatchThreadID)
{
    if(dispatchThreadID.x >= 5 * _TileTypeCount) return;
    uint status;
    _RW_ArgBuffer[dispatchThreadID.x] = _CounterBuffer.Load(4 * dispatchThreadID.x, status);
}

ONITileRenderShader.shader

由于我们画的TileTexture是从左上角开始的,采样的时候会用到一些用1减去uv的y值,旋转也不会乘上-1。前面给缝隙画上灰色的好处也在这里体现出来,我们可以用两个smoothstep来找到一定程度上抗锯齿的渲染区域和缝隙区域。此外在开启MSAA的时候可能会需要根据uv是否在01之间剔除掉当前像素,不然会有奇怪的边缘锯齿。

Shader "zznewclear13/ONITileRenderShader"
{
    HLSLINCLUDE

    #pragma enable_d3d11_debug_symbols
    #include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
    #include "ONITileRender.hlsl"

    StructuredBuffer<PerTileProperty> _TileData;
    sampler2D _TileTexture;
    sampler2D _MainTexture;
    float4 _MainTexture_ST;
    uint _TileType;

    struct Attributes
    {
        uint vertexID       : SV_VERTEXID;
        uint instanceID     : SV_INSTANCEID;
    };

    struct Varyings
    {
        float4 positionCS   : SV_POSITION;
        float2 uv           : TEXCOORD0;
        float3 positionWS   : TEXCOORD1;
        uint mode           : TEXCOORD2;
        uint rotation       : TEXCOORD3;
        uint channel        : TEXCOORD4;
    };

    Varyings vert(Attributes input)
    {
        Varyings output = (Varyings)0;

        PerTileProperty prop = _TileData[input.instanceID + _TileType * _TextureSizeExt.x * _TextureSizeExt.y];

        uint vertexIndex = input.vertexID % 3;
        uint triangleID = input.vertexID / 3;
        uint uvX = ((vertexIndex&2)>>1)^triangleID;
        uint uvY = ((vertexIndex+vertexIndex>>1)&1)^triangleID;
        float2 uv = float2(uvX, uvY);

        float2 tileCoord = (prop.coord + 0.5f) * _TileSize;
        float3 center = float3(tileCoord.x, 0.0f, tileCoord.y) + _TileStartPos;
        float2 positionWSOffset = (uv - 0.5f) * _TileSize;
        float3 positionWS = center + float3(positionWSOffset.x, 0.0f, positionWSOffset.y);

        output.positionCS = mul(UNITY_MATRIX_VP, float4(positionWS, 1.0f));
        output.uv = uv;
        output.positionWS = positionWS;
        output.mode = prop.mode;
        output.rotation = prop.rotation;
        output.channel = prop.channel;
        return output;
    }

    float4 frag(Varyings input) : SV_TARGET
    {
        float2 uv = input.uv;
        uv.y = 1.0f - uv.y;

        int2 startCoord = int2(input.mode % ONI_TILE_TEXTURE_SIZE, input.mode / ONI_TILE_TEXTURE_SIZE);
        float2 startUV = (float2)(startCoord) / ONI_TILE_TEXTURE_SIZE;

        float rotationVal = (float)(input.rotation) * PI * 0.5f;
        float sinVal, cosVal;
        sincos(rotationVal, sinVal, cosVal);
        float2x2 rotationMat = float2x2(cosVal, sinVal, -sinVal, cosVal);
        float2 rotatedCoord = saturate(mul(rotationMat, uv - 0.5f) + 0.5f);
        float2 sampleCoord = startUV + rotatedCoord / ONI_TILE_TEXTURE_SIZE;
        sampleCoord.y = 1.0f - sampleCoord.y;

        float4 tileTexture = tex2D(_TileTexture, sampleCoord);
        float visColor = tileTexture[input.channel];

        float textureMask = smoothstep(0.25f, 1.0f, visColor);
        float gapMask = smoothstep(0.0f, 0.25f, visColor);

        float3 mainTex = tex2D(_MainTexture, input.positionWS.xz * _MainTexture_ST.xy + _MainTexture_ST.zw).rgb;
        mainTex = lerp(0.0f, mainTex, textureMask);
        float4 returnColor = float4(mainTex, gapMask);

        return returnColor;
    }

    ENDHLSL

    SubShader
    {
        pass
        {
            Tags {"Queue"="Transparent" "RenderType"="Transparent"}
            Blend SrcAlpha OneMinusSrcAlpha
            ZTest LEqual
            ZWrite Off
            HLSLPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            ENDHLSL       
        }
    }
}

总结

总之就是很爽很快乐很有成就感。因为画图水平不行,有的地方的接缝差了一点点,也无所谓了。由于之前被HLSL里的Array坑了很多次,这里刻意地去限制了Array的数量,在加上之前玩了一会图灵完备,计算Procedural Quad的UV和顶点坐标简直不在话下,要是以前肯定就写一个长度为6的array进行采样了。对ByteAddressBuffer也有了新的理解,居然能用来计数,从而DrawIndirect或者DispatchIndirect,以前的话我只会CopyCounterValue。最后的最后也夸一下缺氧,画面确实很让人有深刻的印象,液体的渲染更是独树一帜,就是一个人玩太枯燥了。