阅读前提示

由于本文使用的贴图均为LearnOpenGL网站上的贴图,其法线贴图和一般Unity或Unreal引擎中的法线贴图的Y分量是相反的,因此在计算世界坐标的bitangent的时候会额外再乘上一个sign,在正常情况下是不需要的。

视差效果

在三角形面数比较受限的情况下,往往会考虑使用一张高度图,通过视差的计算去渲染出一种3D的效果(虽然现在直接用曲面细分Tessellation似乎是一种更普遍的且更有效的方法)。有两种计算视差的方法,一种叫做Parallax Occlusion Mapping,先假定高度的层数,然后对每一层计算出合适的位置和颜色,从而达到3D效果;另一种叫做Cone Step Mapping,是根据高度图预先计算出每个点对于其他所有像素的最大的圆锥张角(有点像AO),根据圆锥张角快速步进,最后使用二分法计算出最终的交点的颜色。第一种方法有一个比较大的缺点,就是在视角比较接近平面的时候,如果采样次数不是很高,就会看到一层一层的效果,可以通过对最后一次计算深度进行线性插值在一定程度上减轻一层一层的问题;第二种方法的缺点是,当采样次数较小时,产生的图像会有一定程度的扭曲,但不会有一层一层的感觉,此外相较于第一种会有一个优点,较细物体不会被跳过。在GPU Gems 3中提到了一种Cone Step Mapping的优化,叫做Relaxed Cone Step Mapping,相较于之前计算最大张角的方式,这种优化通过确保通过圆锥的射线与圆锥内部的高度图至多只有一个交点,减少了一开始圆锥步进的次数。本文就主要使用这种方法进行计算,也许将圆锥的顶部放在比当前高度图更深的位置能够更加减少步进的次数,不过我稍微尝试了一下好像效果并不是特别理想。

Parallax Occlusion Mapping可以在Learn OpenGL里找到介绍和优化方案,Shadertoy上也有开源的代码可以参考。UE5中有一个叫Get Relief!的插件,可以用来快速生成Relaxed Cone Step Mapping的预计算的贴图,也提供了渲染的Shader。这个插件的作者Daniel Elliott也在GDC2023上分享了制作的思路,如果链接打不开的话这里还有一个GDC Vault的链接

本文使用的贴图可以在Learn OpenGL中给出的下载链接中找到。为了看上去舒服一些,这里对displacement贴图的颜色进行了反向。

下图是两种视差做法的比较,左边是Parallax Occlusion Mapping,右边是Relaxed Cone Step Mapping,两者的采样次数是相同的,可以看到POM在较极限的情况下会有分层感而RCSM会有扭曲。RCSM使用的贴图也放在下面了,R通道是高度图,G通道是圆锥的张角。本文使用的是Unity 2021.3.19f1c1。

POM vs RCSM

rcsm.png

生成预计算的贴图

和Parallax Occlusion Mapping直接使用深度图不同的是,Cone Step Mapping需要预先计算出一张圆锥张角的图,圆锥的张角可以使用圆锥底的半径除以圆锥的高来表示,记为coneRatio。本文中使用的是高度图,但实际计算中会使用1减去高度值,对应的是从模型表面到实际高度的深度值。由于深度值只会在01之间,uv也只会在01之间,因此对于最深的点,其最大的圆锥张角不会大于1。

“确保通过圆锥的射线与圆锥内部的高度图至多只有一个交点”,对于圆锥顶部的currentPos和圆锥底部的rayStartPos(这个圆锥是一个倒立的圆锥,其底部和模型表面相平),可以采样一个目标点cachedPos,当cachedPos的深度小于currentPos的深度时,沿着cachedPos - rayStartPos的方向移动cachedPos的位置并一直采样所有像素samplePos,直到samplePos的深度值小于cachedPos(即射线穿过高度图并穿出),根据samplePoscurrentPos就能计算出一个圆锥的张角coneRatio。循环所有的像素就能得到最小的圆锥张角了。

为了减少单次计算的消耗,本文会先将整张图片分成NxN大小的区域,在一次循环中会计算所有像素对于这NxN大小的区域的圆锥张角,循环所有的区域就能得到最后的圆锥张角了。同时只需要让N等于THREAD_GROUP_SIZE,就能使用group shared memory仅通过一次采样缓存这些区域的深度值。再有就是Early Exit的优化,当cachedPos在贴图外部,当cachedPos的深度大于currentPos的深度,当cachedPos的圆锥张角大于当前最小的圆锥张角,在这些情况下可以直接结束向外步进的循环。更多的优化方法也都能在Get Relief!的分享中找到。

具体的代码

RCSMComputeShader.compute

用于生成Relaxed Cone Step Mapping的贴图。PreProcessMain用于处理最一开始的深度图,预先设置最大的coneRatio为1。Early Exit是减少运算时间的关键。

#pragma kernel PreProcessMain
#pragma kernel RCSMMain

#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
#include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"

Texture2D<float4> _SourceTex;
RWTexture2D<float4> _RW_TargetTex;
SamplerState sampler_LinearClamp;
float4 _TextureSize;
float2 _CacheOffset;

#define THREAD_GROUP_SIZE 16u

[numthreads(8, 8, 1)]
void PreProcessMain(uint3 id : SV_DispatchThreadID)
{
    uint2 tempID = uint2(id.x, _TextureSize.y - 1.0f - id.y);
    float sourceTex = _SourceTex.Load(uint3(tempID, 0)).r;
    _RW_TargetTex[id.xy] = float4(sourceTex, 1.0f, 0.0f, 0.0f);
}

float3 LoadPos(uint2 coord)
{
    return float3((coord + 0.5f) * _TextureSize.zw, 1.0f - _SourceTex.Load(uint3(coord, 0)).r);
}

float3 LoadPos(uint2 coord, out float coneRatio)
{
    float2 sourceTex = _SourceTex.Load(uint3(coord, 0)).rg;
    coneRatio = sourceTex.y;
    return float3((coord + 0.5f) * _TextureSize.zw, 1.0f - sourceTex.x);
}

float3 SamplePos(float2 uv)
{
    return float3(uv, 1.0f - _SourceTex.SampleLevel(sampler_LinearClamp, uv, 0.0f).r);
}

const static uint CACHED_POS_SIZE = THREAD_GROUP_SIZE * THREAD_GROUP_SIZE;
groupshared float3 cachedPos[CACHED_POS_SIZE];
void SetCachedPos(float3 pos, uint index) { cachedPos[index] = pos; }
float3 GetCachedPos(uint index) { return cachedPos[index]; }
void CachePos(uint2 cacheStartPos, uint cacheIndex)
{
    uint2 offset = uint2(cacheIndex % THREAD_GROUP_SIZE, cacheIndex / THREAD_GROUP_SIZE);
    uint2 sampleCoord = cacheStartPos + offset;
    float3 pos = LoadPos(sampleCoord);
    SetCachedPos(pos, cacheIndex);
}

[numthreads(THREAD_GROUP_SIZE, THREAD_GROUP_SIZE, 1)]
void RCSMMain(uint3 groupID : SV_GroupID,
                uint3 groupThreadID : SV_GroupThreadID,
                uint groupIndex : SV_GroupIndex,
                uint3 dispatchThreadID : SV_DispatchThreadID)
{
    uint2 cacheStartPos = uint2(_CacheOffset)*THREAD_GROUP_SIZE;
    CachePos(cacheStartPos, groupIndex);
    GroupMemoryBarrierWithGroupSync();

    float coneRatio;
    float3 currentPos = LoadPos(dispatchThreadID.xy, coneRatio);
    float3 rayStartPos = float3(currentPos.xy, 0.0f);
    const int steps = 128;

    for (uint cacheIndex = 0; cacheIndex < CACHED_POS_SIZE; cacheIndex++)
    {
        uint2 offset = uint2(cacheIndex % THREAD_GROUP_SIZE, cacheIndex / THREAD_GROUP_SIZE);
        uint2 sampleCoord = cacheStartPos + offset;
        if (any(float2(sampleCoord) >= _TextureSize.xy)) continue;
        if (length((int2(sampleCoord.xy) - int2(dispatchThreadID.xy)) * _TextureSize.zw) > coneRatio * currentPos.z) continue;
        if (all(sampleCoord == dispatchThreadID.xy)) continue;

        float3 cachedPos = GetCachedPos(cacheIndex);
        float3 dir = cachedPos - rayStartPos;
        float dirXYLength = length(dir.xy);
        float3 normalizedDir = dir / dirXYLength;
        float stepLength = 1.414 * _TextureSize.z;

        for (int j = 0; j < steps; j++)
        {
            cachedPos += stepLength * normalizedDir;
            if (any(cachedPos.xy >= 1.0f) || any(cachedPos.xy <= 0.0f)) break;
            if (cachedPos.z > currentPos.z) break;
            if (length(cachedPos.xy - currentPos.xy) / (currentPos.z - cachedPos.z) > coneRatio) break;

            float3 samplePos = SamplePos(cachedPos.xy);
            if (samplePos.z > currentPos.z) continue;
            float tempConeRatio = length(samplePos.xy - currentPos.xy) / (currentPos.z - samplePos.z);
            if (tempConeRatio < coneRatio)
            {
                coneRatio = tempConeRatio;
            }
        }
    }

    _RW_TargetTex[dispatchThreadID.xy] = float4(1.0f - currentPos.z, coneRatio, 0.0f, 1.0f);
}

RelaxedConeStepMappingGenerator.cs

需要注意的是这里保存的格式是tga,如果是存成jpg的话会有压缩的问题。此外还要注意深度图和预计算的贴图储存的不是颜色值,因此不能勾选srgb,coneRatio也不太适合MipMap。

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEditor;
using Unity.EditorCoroutines.Editor;
using System.IO;

public class RelaxedConeStepMappingGenerator : EditorWindow
{
    private ComputeShader computeShader;
    private Texture2D texture;
    private string savePath = "Assets/ParallaxMapping/rcsm";
    private static readonly string suffix = ".tga";

    private Vector2Int textureSize;
    private RenderTexture[] rts = new RenderTexture[2];
    private EditorCoroutine editorCoroutine;
    
    Rect rect
    {
        get { return new Rect(20.0f, 20.0f, position.width - 40.0f, position.height - 10.0f); }
    }

    private void EnsureRTs()
    {
        foreach (var rt in rts)
        {
            if (rt != null) rt.Release();
        }
        rts = new RenderTexture[2];
    }

    private void EnsureRT(ref RenderTexture rt, int width, int height)
    {
        if(rt == null || rt.width != width || rt.height != height)
        {
            if(rt != null) rt.Release();
            RenderTextureDescriptor desc = new RenderTextureDescriptor
            {
                width = width,
                height = height,
                volumeDepth = 1,
                dimension = UnityEngine.Rendering.TextureDimension.Tex2D,
                depthBufferBits = 0,
                msaaSamples = 1,
                graphicsFormat = UnityEngine.Experimental.Rendering.GraphicsFormat.R8G8B8A8_UNorm,
                enableRandomWrite = true
            };
            rt = new RenderTexture(desc);
            if (!rt.IsCreated()) rt.Create();
        }
    }

    [MenuItem("zznewclear13/Relaxed Cone Step Mapping Generator")]
    public static void Init()
    {
        RelaxedConeStepMappingGenerator window = GetWindow<RelaxedConeStepMappingGenerator>("Relaxed Cone Step Mapping Generator");

        window.Show();
        window.Repaint();
        window.Focus();
    }

    private void OnGUI()
    {
        using (new GUILayout.AreaScope(rect))
        {
            computeShader = (ComputeShader)EditorGUILayout.ObjectField("Compute Shader", computeShader, typeof(ComputeShader), false);
            texture = (Texture2D)EditorGUILayout.ObjectField("Texture", texture, typeof(Texture2D), false);
            savePath = EditorGUILayout.TextField("Save Path", savePath);

            using (new EditorGUI.DisabledGroupScope(!computeShader || !texture))
            {
                if(GUILayout.Button("Generate!", new GUILayoutOption[] { GUILayout.Height(30.0f) }))
                {
                    GenerateRCSM();
                }
            }
        }
    }

    private static Vector4 GetTextureSize(Vector2Int textureSize)
    {
        return new Vector4(textureSize.x, textureSize.y, 1.0f / textureSize.x, 1.0f / textureSize.y);
    }

    private void PreProcess(RenderTexture target)
    {
        int kernelID = computeShader.FindKernel("PreProcessMain");
        computeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
        computeShader.SetVector("_TextureSize", GetTextureSize(textureSize));
        computeShader.SetTexture(kernelID, "_SourceTex", texture);
        computeShader.SetTexture(kernelID, "_RW_TargetTex", target);
        computeShader.Dispatch(kernelID,
            Mathf.CeilToInt(textureSize.x / x),
            Mathf.CeilToInt(textureSize.y / y),
            1);
    }

    private void ComputeRCSM(Vector2Int offset, RenderTexture source, RenderTexture target)
    {
        int kernelID = computeShader.FindKernel("RCSMMain");
        computeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
        computeShader.SetVector("_TextureSize", GetTextureSize(textureSize));
        computeShader.SetVector("_CacheOffset", new Vector2(offset.x, offset.y));
        computeShader.SetTexture(kernelID, "_SourceTex", source);
        computeShader.SetTexture(kernelID, "_RW_TargetTex", target);
        computeShader.Dispatch(kernelID,
            Mathf.CeilToInt(textureSize.x / x),
            Mathf.CeilToInt(textureSize.y / y),
            1);
    }

    private IEnumerator DispatchCompute()
    {
        PreProcess(rts[0]);
        yield return null;

        int kernelID = computeShader.FindKernel("RCSMMain");
        computeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
        Vector2Int dispatchCount = new Vector2Int(Mathf.CeilToInt(textureSize.x / x),
                                                    Mathf.CeilToInt(textureSize.y / y));

        int fromID = 0;
        bool cancel = false;
        for (int i = 0; i < dispatchCount.x; i++)
        {
            for (int j = 0; j < dispatchCount.y; j++)
            {
                ComputeRCSM(new Vector2Int(i, j), rts[fromID], rts[1 - fromID]);
                fromID = 1 - fromID;
                yield return null;
            }
            cancel = EditorUtility.DisplayCancelableProgressBar("In Progress...", i + "/" + dispatchCount.x, (float)i / dispatchCount.x);
            if (cancel) break;
        }
        EditorUtility.ClearProgressBar();
        if (!cancel) SaveRenderTextureToFile(rts[fromID]);
    }

    private void GenerateRCSM()
    {
        textureSize = new Vector2Int(texture.width, texture.height);
        EnsureRTs();
        EnsureRT(ref rts[0], textureSize.x, textureSize.y);
        EnsureRT(ref rts[1], textureSize.x, textureSize.y);

        Stop();
        editorCoroutine = EditorCoroutineUtility.StartCoroutine(DispatchCompute(), this);
    }

    private void SaveRenderTextureToFile(RenderTexture rt)
    {
        RenderTexture prev = RenderTexture.active;
        RenderTexture.active = rt;

        Texture2D toSave = new Texture2D(textureSize.x, textureSize.y, TextureFormat.ARGB32, false, true);
        toSave.ReadPixels(new Rect(0.0f, 0.0f, textureSize.x, textureSize.y), 0, 0);
        byte[] bytes = toSave.EncodeToTGA();
        FileStream fs = File.OpenWrite(savePath + suffix);
        fs.Write(bytes);
        fs.Close();
        AssetDatabase.Refresh();

        TextureImporter ti = (TextureImporter)AssetImporter.GetAtPath(savePath + suffix);
        ti.mipmapEnabled = false;
        ti.sRGBTexture = false;
        ti.SaveAndReimport();

        Texture2D tempTexture = AssetDatabase.LoadAssetAtPath<Texture2D>(savePath + suffix);
        EditorGUIUtility.PingObject(tempTexture);

        RenderTexture.active = prev;
    }

    private void Stop()
    {
        if (editorCoroutine != null) EditorCoroutineUtility.StopCoroutine(editorCoroutine);
    }

    private void OnDestroy()
    {
        foreach (var rt in rts)
        {
            if (rt != null) rt.Release();
        }
        rts = new RenderTexture[2];
    }
}

RCSMVisualizeShader.shader

计算Parallax的地方分成两个循环,第一个循环通过coneRatio和深度值进行光线步进直到采样点在高度图内部,第二个循环通过二分法获得较为准确的uv。

Shader "zznewclear13/RCSMVisualizeShader"
{
    Properties
    {
        _BaseColor("Base Color", Color) = (1, 1, 1, 1)
        _MainTex ("Texture", 2D) = "white" {}
        _RCSMTex("RCSM Texture", 2D) = "white" {}
        _NormalMap("Normal Map", 2D) = "bump" {}
        _NormalIntensity("Normal Intensity", Range(0, 2)) = 1

        _ParallaxIntensity("Parallax Intensity", Float) = 1
        _ParallaxIteration("Parallax Iteration", Float) = 15
    }

    HLSLINCLUDE
    #include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Lighting.hlsl"

    sampler2D _MainTex;
    sampler2D _NormalMap;
    sampler2D _RCSMTex;
    CBUFFER_START(UnityPerMaterial)
    float4 _BaseColor;
    float _NormalIntensity;
    float _ParallaxIntensity;
    float _ParallaxIteration;
    CBUFFER_END

    struct Attributes
    {
        float4 positionOS   : POSITION;
        float3 normalOS     : NORMAL;
        float4 tangentOS    : TANGENT;
        float2 texcoord     : TEXCOORD0;
    };
    
    struct Varyings
    {
        float4 positionCS   : SV_POSITION;
        float2 uv           : TEXCOORD0;
        float4 tbnView[3]   : TEXCOORD1;
    };
    
    Varyings vert(Attributes input)
    {
        Varyings output = (Varyings)0;
        VertexPositionInputs vpi = GetVertexPositionInputs(input.positionOS.xyz);
        VertexNormalInputs vni = GetVertexNormalInputs(input.normalOS, input.tangentOS);
        
        float3 cameraOS = mul(UNITY_MATRIX_I_M, float4(GetCameraPositionWS(), 1.0f)).xyz;
        float sign = (input.tangentOS.w > 0.0 ? 1.0 : -1.0) * GetOddNegativeScale();
        float3 bitangent = cross(input.normalOS, input.tangentOS.xyz) * sign;
        float3x3 tbnMat = float3x3(input.tangentOS.xyz, bitangent, input.normalOS);
        float3 viewTS = mul(tbnMat, cameraOS - input.positionOS.xyz);

        output.positionCS = vpi.positionCS;
        output.uv = input.texcoord;
        output.tbnView[0] = float4(vni.tangentWS, viewTS.x);
        output.tbnView[1] = float4(vni.bitangentWS * sign, viewTS.y);
        output.tbnView[2] = float4(vni.normalWS, viewTS.z);
        return output;
    }

    float2 sampleRCSM(float2 uv)
    {
        float2 rcsm = tex2D(_RCSMTex, uv).xy;
        return float2(1.0f - rcsm.x, rcsm.y);
    }

    float getStepLength(float rayRatio, float coneRatio, float rayHeight, float sampleHeight)
    {
        float totalRatio = rayRatio / coneRatio + 1.0f;
        return (sampleHeight - rayHeight) / totalRatio;
    }

    float2 parallax(float2 uv, float3 view)
    {
        view.xy = -view.xy * _ParallaxIntensity;
        float3 samplePos = float3(uv, 0.0f);
        float2 rcsm = sampleRCSM(samplePos.xy);
        float rayRatio = length(view.xy);
        float coneRatio = rcsm.y;
        float rayHeight = samplePos.z;
        float sampleHeight = rcsm.x;

        float stepLength = getStepLength(rayRatio, coneRatio, rayHeight, sampleHeight);  
        [unroll(30)]
        for (int i = 0; i < _ParallaxIteration; ++i)
        {
            samplePos += stepLength * view;
            rcsm = sampleRCSM(samplePos.xy);
            coneRatio = rcsm.y;
            rayHeight = samplePos.z;
            sampleHeight = rcsm.x;
            if (sampleHeight <= rayHeight) break;
        
            stepLength = getStepLength(rayRatio, coneRatio, rayHeight, sampleHeight);
        }

        stepLength *= 0.5f;
        samplePos -= stepLength * view;

        [unroll]
        for (int j = 0; j < 5; ++j)
        {
            rcsm = sampleRCSM(samplePos.xy);
            stepLength *= 0.5f;
            if (samplePos.z >= rcsm.x)
            {
                samplePos -= stepLength * view;
            }
            else if(samplePos.z < rcsm.x)
            {
                samplePos += stepLength * view;
            }
        }

        return samplePos.xy;
    }


    float4 frag(Varyings input) : SV_TARGET
    {
        float3 viewTS = normalize(float3(input.tbnView[0].w, input.tbnView[1].w, input.tbnView[2].w));
        float3 tangentWS = normalize(input.tbnView[0].xyz);
        float3 bitangentWS = normalize(input.tbnView[1].xyz);
        float3 normalWS = normalize(input.tbnView[2].xyz);

        float z = max(abs(viewTS.z), 1e-5) * (viewTS.z >= 0.0f ? 1.0f : -1.0f);
        float2 uv = parallax(input.uv, viewTS / z);

        float4 mainTex = tex2D(_MainTex, uv) * _BaseColor;
        float3 normalTS = normalize(UnpackNormalScale(tex2D(_NormalMap, uv), _NormalIntensity));
        
        float3 n = normalize(normalTS.x * tangentWS + normalTS.y * bitangentWS + normalTS.z * normalWS);
        Light mainLight = GetMainLight();
        float ndotl = max(0.0f, dot(n, mainLight.direction));

        float3 color = mainTex.rgb * mainLight.color * ndotl;
        float alpha = mainTex.a;
        return float4(color, alpha);
    }
            
    ENDHLSL

    SubShader
    {
        Tags{ "RenderType"="Transparent" "Queue"="Transparent"}
        Blend SrcAlpha OneMinusSrcAlpha
        ZWrite Off
        Cull Back

        Pass
        {
            HLSLPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            ENDHLSL
        }
    }
}

POMShader.shader

很大程度地参考了normal vs parallax的计算方式。

Shader "zznewclear13/POMShader"
{
    Properties
    {
        _BaseColor("Base Color", Color) = (1, 1, 1, 1)
        _MainTex ("Texture", 2D) = "white" {}
        _HeightMap("Height Map", 2D) = "white" {}
        _NormalMap("Normal Map", 2D) = "bump" {}
        _NormalIntensity("Normal Intensity", Range(0, 2)) = 1

        _ParallaxIntensity ("Parallax Intensity", Float) = 1
        _ParallaxIteration ("Parallax Iteration", Float) = 15
    }

    HLSLINCLUDE
    #include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Lighting.hlsl"

    sampler2D _MainTex;
    sampler2D _HeightMap;
    sampler2D _NormalMap;
    CBUFFER_START(UnityPerMaterial)
    float4 _BaseColor;
    float _NormalIntensity;
    float _ParallaxIntensity;
    float _ParallaxIteration;
    CBUFFER_END

    struct Attributes
    {
        float4 positionOS   : POSITION;
        float3 normalOS     : NORMAL;
        float4 tangentOS    : TANGENT;
        float2 texcoord     : TEXCOORD0;
    };
    
    struct Varyings
    {
        float4 positionCS   : SV_POSITION;
        float2 uv           : TEXCOORD0;
        float4 tbnView[3]   : TEXCOORD1;
    };
    
    Varyings vert(Attributes input)
    {
        Varyings output = (Varyings)0;
        VertexPositionInputs vpi = GetVertexPositionInputs(input.positionOS.xyz);
        VertexNormalInputs vni = GetVertexNormalInputs(input.normalOS, input.tangentOS);
        
        float3 cameraOS = mul(UNITY_MATRIX_I_M, float4(GetCameraPositionWS(), 1.0f)).xyz;
        float sign = (input.tangentOS.w > 0.0 ? 1.0 : -1.0) * GetOddNegativeScale();
        float3 bitangent = cross(input.normalOS, input.tangentOS.xyz) * sign;
        float3x3 tbnMat = float3x3(input.tangentOS.xyz, bitangent, input.normalOS);
        float3 viewTS = mul(tbnMat, cameraOS - input.positionOS.xyz);

        output.positionCS = vpi.positionCS;
        output.uv = input.texcoord;
        output.tbnView[0] = float4(vni.tangentWS, viewTS.x);
        output.tbnView[1] = float4(vni.bitangentWS * sign, viewTS.y);
        output.tbnView[2] = float4(vni.normalWS, viewTS.z);
        return output;
    }

    float sampleHeight(float2 uv)
    {
        return 1.0f - tex2D(_HeightMap, uv).r;
    }

    float2 parallax(float2 uv, float3 view)
    {
        float numLayers = _ParallaxIteration;
        float layerDepth = 1.0f / numLayers;

        float2 p = view.xy * _ParallaxIntensity;
        float2 deltaUVs = p / numLayers;

        float texd = sampleHeight(uv);
        float d = 0.0f;
        [unroll(30)]
        for (; d < texd; d += layerDepth)
        {
            uv -= deltaUVs;
            texd = sampleHeight(uv);
        }

        float2 lastUVs = uv + deltaUVs;
        float after = texd - d;
        float before = sampleHeight(lastUVs) - d + layerDepth;
        float w = after / (after - before);

        return lerp(uv, lastUVs, w);
    }

    float4 frag(Varyings input) : SV_TARGET
    {
        float3 viewTS = normalize(float3(input.tbnView[0].w, input.tbnView[1].w, input.tbnView[2].w));
        float3 tangentWS = normalize(input.tbnView[0].xyz);
        float3 bitangentWS = normalize(input.tbnView[1].xyz);
        float3 normalWS = normalize(input.tbnView[2].xyz);

        float z = max(abs(viewTS.z), 1e-5) * (viewTS.z >= 0.0f ? 1.0f : -1.0f);
        float2 uv = parallax(input.uv, viewTS / z);
        
        float4 mainTex = tex2D(_MainTex, uv) * _BaseColor;
        float3 normalTS = normalize(UnpackNormalScale(tex2D(_NormalMap, uv), _NormalIntensity));
        
        float3 n = normalize(normalTS.x * tangentWS + normalTS.y * bitangentWS + normalTS.z * normalWS);
        Light mainLight = GetMainLight();
        float ndotl = max(0.0f, dot(n, mainLight.direction));

        float3 color = mainTex.rgb * mainLight.color * ndotl;
        float alpha = mainTex.a;
        return float4(color, alpha);
    }
            
    ENDHLSL

    SubShader
    {
        Tags{ "RenderType"="Transparent" "Queue"="Transparent"}
        Blend SrcAlpha OneMinusSrcAlpha
        ZWrite Off
        Cull Back

        Pass
        {
            HLSLPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            ENDHLSL
        }
    }
}

后记

夏天真的好热,热到头脑都不是很清醒了,感觉RCSM应该要比POM好很多才对,在自己的测试中也只稍微好了一些些,当然也有可能是我哪里没算对了。。。不过蛮奇怪的GPU Gems 3发表于2007年,直到今天我也没看到别的Unity上实现RCSM的文章或者github仓库,UE5也只有Get Relief这么一个插件。是因为曲面细分实在太好用了的原因吗?想起之前看最后生还者2的技术分享,里面大量地使用了高度图,难道主要是用来做多种材质的混合而不是做视差效果吗?