Unity使用ComputeShader计算GTAO

环境光遮蔽

环境光遮蔽，在很久很久以前玩刺客信条的时候就看到过这个词语，但是并不懂什么意思，本着画质拉满的原则总是会勾选这个选项。后来才知道环境光遮蔽翻译自Ambient Occlusion（还真是直白的翻译），用来表现角落里阴暗的效果。

环境光遮蔽作用在光线计算的间接光照的阶段，由于光栅化渲染的局限性，间接光照往往分为漫反射间接光照和高光间接光照，因此环境光遮蔽也分漫反射和高光两种，这里暂时只讨论作用于漫反射间接光照的漫反射环境光遮蔽。而又由于前向渲染的局限性，屏幕空间的环境光遮蔽不分差别地作用于直接光照和间接光照，因此其强度还需要特别地留意。

Ground Truth Ambient Occlusion是Jorge Jimenez在他的文章Practical Real-Time Strategies for Accurate Indirect Occlusion中介绍的一种在主机上能够符合事实环境光遮蔽效果的一种屏幕空间环境光遮蔽的算法。我认为这个算法相较于其他的环境光遮蔽的算法最大的优点是，暗部够暗，在很窄的缝隙中能够很黑很黑，这是别的算法做不到的。

本文极大地参考了英特尔的XeGTAO开源代码。

具体的操作

这篇文章着重要讲的是使用Compute Shader来加速计算的操作方式，因此不会具体涉及到GTAO算法本身，感兴趣的话可以去SIGGRAPH 2016 Course上阅读GTAO的ppt。

GTAO的计算需要视空间法线和深度两个数据，如果是延迟管线的话能轻易得拿到所有数据，但对于前向渲染来说，需要从深度数据还原出视空间法线。正好我之前的文章介绍了一些从深度图计算视空间法线的方法。但在原有文章的基础上，我们还能使用Group Shared Memory对采样数进行一系列的优化。

由于GTAO相对来说算是比较低频的信息，我们可以考虑使用下采样的方式只用半分辨率甚至是更低的分辨率来计算GTAO。这里使用的方法是对NxN大小的区域，每一帧只取一个采样点，最后通过TAA来进行混合。

GTAO本身的采样数也能使用时空噪声来生成较少的采样点，最后通过TAA来进行混合。但是实际使用中发现，如果使用较多的时间混合，当场景中的物体发生移动之后，会露出一部分白色的画面，和较深的AO有比较明显的对比，因此考虑尽量多地使用空间的混合。

得益于Group Shared Memory，可以在非常大的范围内进行空间的混合，这里使用高斯模糊的方式进行混合，能够尽量保持暗部较暗的颜色。如果直接对所有的采样进行平均的话，会导致暗部变得很亮，失去了GTAO最出众的优点。对水平和竖直方向做两次高斯模糊的话，由于本身还会根据深度和法线算出额外的几何上的权重，在下采样程度较大的时候会产生比较明显的瑕疵，可以用全分辨率的深度图和法线来解决，但这会带来额外的采样。

在高斯模糊的阶段，由于模糊是作用在低分辨率的图像上的，在我们的上采样的操作中，还需要根据上采样的位置进行双线性插值（实际上只要一个方向线性插值就好了）。

Render Texture的精度上，GTAO最后的值可以用8位通道来储存，如果不需要额外的视空间法线的话，可以把GTAO值和24位的深度一起存到RGBA32的RT中。这里就偷懒使用R16G16B16A16_SFloat来储存了。

如此一来整个路线图就比较清晰了

下采样深度图获取深度数据
使用深度图计算视空间的法线，或者从G Buffer直接获取法线数据
使用深度图和法线计算GTAO的值
横向上采样，计算水平高斯模糊后的GTAO的值
纵向上采样，计算垂直高斯模糊后的GTAO的值

相关代码和说明

GTAOComputeShader.compute

重中之重就是Compute Shader了。分了四个Kernel：第一个计算GTAO的值，同时还储存了深度图和法线（除了直接储存法线的两个分量，也可以Encode成八面体来储存）；第二个和第三个分别是水平和竖直方向的模糊；最后一个用来可视化，实际项目中可以不用这个。

和XeGTAO不同的是，我增加了一个USE_AVERAGE_COS的宏，正常是在每一个Slice中选择最大的cos值，但是考虑到场景中有栅格这样的物体，在时空混合程度不是很大的时候，可以计算cos的平均值来降低栅格对GTAO的影响（也就是减弱了噪声），这个宏完全可以不用开启。

本文为了尽量多的使用空间混合（亦即不使用时间混合），在XeGTAO的时空平均噪波中限制了时间的参数为13，这样GTAO就不会随着时间而变化了，实际上可以传入_FrameIndex充分利用时空噪波的优势。

主要是用groupIndex来储存和读取Group Shared Memory，每个点至多采样两次。计算法线时会采样5x5的区域，因此NORMAL_FROM_DEPTH_PIXEL_RANGE的值是2；计算模糊时既有高斯模糊的采样，还有后续手动线性插值的采样，所以CACHED_AO_NORMAL_DEPTH_FOR_BLUR_SIZE会有两者之和。线性插值还需要注意subpixelBias对线性插值的权重产生的影响。

本文使用了宽度为29的高斯核，可以在demofox的网站上轻松的计算很大的高斯核。

可能会有报寄存器使用数量超过限制的问题，感觉是const array和循环导致的，不过reimport之后就不会报这个警告了。

#pragma kernel GTAOMain
#pragma kernel BlurHorizontalMain
#pragma kernel BlurVerticalMain
#pragma kernel VisualizeMain

#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
#include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"

Texture2D<float4> _ColorTexture;
Texture2D<float> _DepthTexture;
Texture2D<float4> _GTAOTexture;

RWTexture2D<float4> _RW_NormalTexture;
RWTexture2D<float4> _RW_GTAOTexture;
RWTexture2D<float4> _RW_BlurTexture;
RWTexture2D<float4> _RW_VisualizeTexture;

SamplerState sampler_LinearClamp;
SamplerState sampler_PointClamp;

//region Parameters
uint _FrameIndex;
uint _DownsamplingFactor;
float _Intensity;
float _SampleRadius;
float _DistributionPower;
float _FalloffRange;
float2 _HeightFogFalloff;
float4 _TextureSize;

float4 _TAAOffsets;
//endregion

//region Pre-defined Marcos
#define SQRT2_2                 0.70710678118

#define USE_AVERAGE_COS                 0
#define SLICE_COUNT                     4
#define STEPS_PER_SLICE                 3
#define GTAO_THREAD_GROUP_SIZE_X        32
#define GTAO_THREAD_GROUP_SIZE_Y        32
#define BLUR_THREAD_GROUP_SIZE          64

// For normal calculation, can be deleted if we calculate normal in mrt.
const static int NORMAL_FROM_DEPTH_PIXEL_RANGE = 2;
const static int CACHED_DEPTH_FOR_NORMAL_OFFSET = (GTAO_THREAD_GROUP_SIZE_X + 2*NORMAL_FROM_DEPTH_PIXEL_RANGE);
const static int CACHED_DEPTH_FOR_NORMAL_SIZE = CACHED_DEPTH_FOR_NORMAL_OFFSET * (GTAO_THREAD_GROUP_SIZE_Y + 2*NORMAL_FROM_DEPTH_PIXEL_RANGE);

// For blur
const static int BILINEAR_RADIUS = 1;
const static int BLUR_RADIUS = 14; // [-14, +14] for 29 tap gaussian blur
const static int CACHED_AO_NORMAL_DEPTH_FOR_BLUR_SIZE = BLUR_THREAD_GROUP_SIZE + 2*(BILINEAR_RADIUS+BLUR_RADIUS);

const static int CACHED_AO_FOR_BILINEAR_SIZE = BLUR_THREAD_GROUP_SIZE+2*BILINEAR_RADIUS;
//endregion

//region Group Shared Memory Help Functions
// For normal calculation
groupshared float depthForNormal[CACHED_DEPTH_FOR_NORMAL_SIZE];

void SetDepthForNormal(float depth, int index) {depthForNormal[index]=depth;}
float GetDepthForNormal(int2 threadPos) {return depthForNormal[threadPos.x+NORMAL_FROM_DEPTH_PIXEL_RANGE+(threadPos.y+NORMAL_FROM_DEPTH_PIXEL_RANGE)*CACHED_DEPTH_FOR_NORMAL_OFFSET];}
void CacheDepthForNormal(int2 groupCacheStartPos, uint cacheIndex, int2 subpixelBias)
{
    int2 threadPos = int2(cacheIndex % CACHED_DEPTH_FOR_NORMAL_OFFSET, cacheIndex / CACHED_DEPTH_FOR_NORMAL_OFFSET);
    int2 texturePos = (groupCacheStartPos + threadPos) * _DownsamplingFactor + subpixelBias;
    float depth = _DepthTexture.Load(uint3(texturePos, 0));
    SetDepthForNormal(depth, cacheIndex);
}

groupshared float4 aoNormalDepthForBlur[CACHED_AO_NORMAL_DEPTH_FOR_BLUR_SIZE];
void SetAONormalDepthForBlur(float4 aoNormalDepth, int index) {aoNormalDepthForBlur[index]=aoNormalDepth;}
float4 GetAONormalDepthForBlur(int threadPos) {return aoNormalDepthForBlur[threadPos+BLUR_RADIUS+BILINEAR_RADIUS];}
void CacheAONormalDepthForBlur(int2 groupCacheStartPos, uint cacheIndex, uint vertical)
{
    int2 threadPos = int2(0, 0);
    threadPos[vertical] = cacheIndex;
    int2 texturePos = groupCacheStartPos + threadPos;
    int2 threshold;
    if(vertical == 0)
    {
        threshold = _TextureSize.xy/_DownsamplingFactor - 1;
    }
    else
    {
        threshold = int2(_TextureSize.x, _TextureSize.y / _DownsamplingFactor) - 1;
    }

    texturePos = clamp(texturePos, 0, threshold);
    float4 aoNormalDepth = _GTAOTexture.Load(uint3(texturePos, 0));
    SetAONormalDepthForBlur(aoNormalDepth, cacheIndex);
}

groupshared float aoForBilinear[CACHED_AO_FOR_BILINEAR_SIZE];
void SetAOForBilinear(float ao, int index) {aoForBilinear[index]=ao;}
float GetAOForBilinear(int threadPos) {return aoForBilinear[threadPos.x+BILINEAR_RADIUS];}
//endregion

//region Space Transformation Help Functions
float3 GetViewSpacePositionFromLinearDepth(float2 uv, float linearDepth)
{
#if UNITY_UV_STARTS_AT_TOP
    uv.y = 1.0 - uv.y;
#endif
    float2 uvNDC = uv * 2.0 - 1.0;
    return float3(uvNDC * linearDepth * rcp(UNITY_MATRIX_P._m00_m11), -linearDepth);
}

float3 GetViewSpacePosition(float2 uv, float depth)
{
#if UNITY_MATRIX_I_P_SUPPORTED
#if UNITY_UV_STARTS_AT_TOP
    uv.y = 1.0 - uv.y;
#endif
    float3 positionNDC = float3(uv * 2.0 - 1.0, depth);
    float4 positionVS = mul(UNITY_MATRIX_I_P, float4(positionNDC, 1.0));
    positionVS /= positionVS.w;
    return positionVS.xyz;
#else
    float linearDepth = LinearEyeDepth(depth, _ZBufferParams);
    return GetViewSpacePositionFromLinearDepth(uv, linearDepth);
#endif
}

float4 LinearEyeDepthFloat4(float4 depthTBLR, float4 zBufferParams)
{
    return rcp(depthTBLR * zBufferParams.z + zBufferParams.w);
}
//endregion

//region GTAO Help Functions
//https://github.com/GameTechDev/XeGTAO/blob/master/Source/Rendering/Shaders/XeGTAO.h

//-UNITY_MATRIX_P._m11 = rcp(tan(fovy / 2))
float GetScreenSpaceRadius(float linearDepth)
{
    return _SampleRadius * _TextureSize.y * (-UNITY_MATRIX_P._m11) / (2 * linearDepth);
}

float GetLengthToPixelRatio(float linearDepth)
{
    return _TextureSize.y * (-UNITY_MATRIX_P._m11) / (2 * linearDepth);
}

// From https://www.shadertoy.com/view/3tB3z3 - except we're using R2 here
#define XE_HILBERT_LEVEL    6U
#define XE_HILBERT_WIDTH    ( (1U << XE_HILBERT_LEVEL) )

uint HilbertIndex( uint posX, uint posY )
{   
    uint index = 0U;
    for( uint curLevel = XE_HILBERT_WIDTH/2U; curLevel > 0U; curLevel /= 2U )
    {
        uint regionX = ( posX & curLevel ) > 0U;
        uint regionY = ( posY & curLevel ) > 0U;
        index += curLevel * curLevel * ( (3U * regionX) ^ regionY);
        if( regionY == 0U )
        {
            if( regionX == 1U )
            {
                posX = uint( (XE_HILBERT_WIDTH - 1U) ) - posX;
                posY = uint( (XE_HILBERT_WIDTH - 1U) ) - posY;
            }

            uint temp = posX;
            posX = posY;
            posY = temp;
        }
    }
    return index;
}

float2 SpatioTemporalNoise(uint2 pixCoord, uint temporalIndex)
{
    uint index = HilbertIndex(pixCoord.x, pixCoord.y);
    index += 288*(temporalIndex%64); // why 288? tried out a few and that's the best so far (with XE_HILBERT_LEVEL 6U) - but there's probably better :)
    // R2 sequence - see http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/
    return frac( 0.5 + index * float2(0.75487766624669276005, 0.5698402909980532659114));
}
//endregion

//region Blur Help Functions
float GetWeight(float4 center, float4 samplePoint)
{
    float4 centerAndSampleXY = float4(center.yz, samplePoint.yz) * 2.0f - 1.0f;
    float3 centerVS = float3(centerAndSampleXY.xy, sqrt(max(0.0f, 1.0f - dot(centerAndSampleXY.xy, centerAndSampleXY.xy))));
    float3 sampleVS = float3(centerAndSampleXY.zw, sqrt(max(0.0f, 1.0f - dot(centerAndSampleXY.zw, centerAndSampleXY.zw))));
    float normalWeight = saturate(dot(centerVS, sampleVS));
    float depthWeight = 1.0f - saturate(abs(center.w - samplePoint.w) * 100.0f);

    return normalWeight * depthWeight;
}

void CacheGaussianBlur(uint cacheAOIndex, uint vertical)
{
    int cacheAOThreadPos = cacheAOIndex - BILINEAR_RADIUS;
    float4 aoNormalDepthC = GetAONormalDepthForBlur(cacheAOThreadPos);
    
    float aoSum = 0.0f;
    float weightSum = 0.0f;

    float4 aoNormalDepth;
    float weight;

    // http://demofox.org/gauss.html
    const float weights[] = 
    {
        0.0002,	0.0005,	0.0011,	0.0023,	0.0044,	0.0080,	0.0136,	0.0217,	0.0325,	0.0457,	0.0605,	0.0752,	0.0879,	0.0965,	0.0995,	0.0965,	0.0879,	0.0752,	0.0605,	0.0457,	0.0325,	0.0217,	0.0136,	0.0080,	0.0044,	0.0023,	0.0011,	0.0005,	0.0002
    };

    [unroll(2*BLUR_RADIUS+1)]
    for (int i = -BLUR_RADIUS; i <= BLUR_RADIUS; ++i)
    {
        aoNormalDepth = GetAONormalDepthForBlur(cacheAOThreadPos + i);
        weight = GetWeight(aoNormalDepthC, aoNormalDepth) * weights[i + BLUR_RADIUS];

        aoSum += aoNormalDepth.r * weight;
        weightSum += weight;
    }

    float avgAO = aoSum / weightSum;
    SetAOForBilinear(avgAO, cacheAOIndex);
}
//endregion

[numthreads(GTAO_THREAD_GROUP_SIZE_X,GTAO_THREAD_GROUP_SIZE_Y,1)]
void GTAOMain(uint3 groupID : SV_GroupID,
                uint3 groupThreadID : SV_GroupThreadID,
                uint groupIndex : SV_GroupIndex,
                uint3 dispatchThreadID : SV_DispatchThreadID)
{
    uint sqrDownSamplingFactor = _DownsamplingFactor * _DownsamplingFactor;
    int subpixelIndex = _FrameIndex % sqrDownSamplingFactor;
    int2 subpixelBias = int2(subpixelIndex % _DownsamplingFactor, subpixelIndex / _DownsamplingFactor);
    int2 pixelCoord = dispatchThreadID.xy * _DownsamplingFactor + subpixelBias;
    float2 uv = (pixelCoord + 0.5) * _TextureSize.zw;
    int2 groupCacheStartPos = groupID.xy * int2(GTAO_THREAD_GROUP_SIZE_X, GTAO_THREAD_GROUP_SIZE_Y) - NORMAL_FROM_DEPTH_PIXEL_RANGE;
    
    //region Cache Normal
    int cacheIndex = groupIndex * 2;
    if(cacheIndex < CACHED_DEPTH_FOR_NORMAL_SIZE-1)
    {
        CacheDepthForNormal(groupCacheStartPos, cacheIndex, subpixelBias);
        CacheDepthForNormal(groupCacheStartPos, cacheIndex + 1, subpixelBias);
    }
    GroupMemoryBarrierWithGroupSync();
    //endregion

    uint loadCacheIndex = groupIndex;
    int2 threadPos = int2(loadCacheIndex % GTAO_THREAD_GROUP_SIZE_X, loadCacheIndex / GTAO_THREAD_GROUP_SIZE_X);

    //region Calculate Normal From Depth
    float depthC    = GetDepthForNormal(threadPos               );
    float depthT    = GetDepthForNormal(threadPos + int2( 0,  1));
    float depthB    = GetDepthForNormal(threadPos + int2( 0, -1));
    float depthL    = GetDepthForNormal(threadPos + int2(-1,  0));
    float depthR    = GetDepthForNormal(threadPos + int2( 1,  0));
    float depthT2   = GetDepthForNormal(threadPos + int2( 0,  2));
    float depthB2   = GetDepthForNormal(threadPos + int2( 0, -2));
    float depthL2   = GetDepthForNormal(threadPos + int2(-2,  0));
    float depthR2   = GetDepthForNormal(threadPos + int2( 2,  0));

    // fp16 depth should use this to prevent contour lines due to loss of depth precision.
    // linearDepth *= 0.99920;
    // This is for fp32 depth.
    // linearDepth *= 0.99999;
    float linearDepth = LinearEyeDepth(depthC, _ZBufferParams) * 0.99999;
    float4 linearDepths = LinearEyeDepthFloat4(float4(depthT, depthB, depthL, depthR), _ZBufferParams) * 0.99999;

    float2 center = pixelCoord + 0.5f;
    float3 viewPosC = GetViewSpacePositionFromLinearDepth(center * _TextureSize.zw, linearDepth);
    float3 viewPosT = GetViewSpacePositionFromLinearDepth((center + float2( 0,  1) * _DownsamplingFactor) * _TextureSize.zw, linearDepths.x);
    float3 viewPosB = GetViewSpacePositionFromLinearDepth((center + float2( 0, -1) * _DownsamplingFactor) * _TextureSize.zw, linearDepths.y);
    float3 viewPosL = GetViewSpacePositionFromLinearDepth((center + float2(-1,  0) * _DownsamplingFactor) * _TextureSize.zw, linearDepths.z);
    float3 viewPosR = GetViewSpacePositionFromLinearDepth((center + float2( 1,  0) * _DownsamplingFactor) * _TextureSize.zw, linearDepths.w);

    float3 t = normalize(viewPosT - viewPosC);
    float3 b = normalize(viewPosC - viewPosB);
    float3 l = normalize(viewPosC - viewPosL);
    float3 r = normalize(viewPosR - viewPosC);

    float4 H = float4(depthL, depthR, depthL2, depthR2);
    float4 V = float4(depthB, depthT, depthB2, depthT2);
    float2 he = abs((2 * H.xy - H.zw) - depthC);
    float2 ve = abs((2 * V.xy - V.zw) - depthC);
    float3 hDeriv = he.x < he.y ? l : r;
    float3 vDeriv = ve.x < ve.y ? b : t;
    float3 normalVS = normalize(cross(hDeriv, vDeriv));
    //endregion

    //region Calculate GTAO From Depth and Normal
    float2 localNoise = SpatioTemporalNoise(dispatchThreadID.xy, 13);

    float3 pixCenterPos = viewPosC;
    float3 viewVec = normalize(-pixCenterPos);

    float effectRadius = _SampleRadius;
    float sampleDistributionPower = _DistributionPower;
    float falloffRange = effectRadius * _FalloffRange;

    float falloffFrom = effectRadius - falloffRange;
    float falloffMul = -rcp(falloffRange);
    float falloffAdd = 1.0 - falloffFrom * falloffMul;

    float visibility = 0.0;
    {
        float noiseSlice = localNoise.x;
        float noiseSample = localNoise.y;
        
        float pixelTooCloseThreshold = 1.3;//Some basic bias preventing sampling current pixel

        float lengthToPixelRatio = GetLengthToPixelRatio(linearDepth);
        float screenSpaceRadius = effectRadius * lengthToPixelRatio;

        //fade GTAO if screenSpaceRadius is too small
        visibility += saturate((10 - screenSpaceRadius) / 200);
        float minS = pixelTooCloseThreshold / screenSpaceRadius;
        
        //2 * SLICE_COUNT * STEPS_PER_SLICE samples

        //Almost exactly "Algorithm 1" in
        //https://www.activision.com/cdn/research/Practical_Real_Time_Strategies_for_Accurate_Indirect_Occlusion_NEW%20VERSION_COLOR.pdf
        [unroll(SLICE_COUNT)]
        for (int slice = 0; slice < SLICE_COUNT; slice++)
        {
            float phi = (slice + noiseSlice) * PI / SLICE_COUNT;
            float sinPhi, cosPhi;
            sincos(phi, sinPhi, cosPhi);
            float2 omega = float2(cosPhi, sinPhi);

            omega *= screenSpaceRadius;
            float3 directionVec = float3(cosPhi, sinPhi, 0.0);
            float3 orthoDirectionVec = directionVec - (dot(directionVec, viewVec) * viewVec);
            float3 axisVec = normalize(cross(orthoDirectionVec, viewVec));
            float3 projectedNormalVec = normalVS - axisVec * dot(normalVS, axisVec);
            float signNormal = sign(dot(orthoDirectionVec, projectedNormalVec));
            float projectedNormalVecLength = length(projectedNormalVec);
            float cosNorm = saturate(dot(projectedNormalVec, viewVec) / projectedNormalVecLength);
            float n = signNormal * acos(cosNorm);
            float lowHorizonCos0 = cos(n + HALF_PI);
            float lowHorizonCos1 = cos(n - HALF_PI);

            //Minor improvement
            float horizonCos0 = lowHorizonCos0;
            float horizonCos1 = lowHorizonCos1;

#if USE_AVERAGE_COS
            float baseCos0 = 0.0;
            float baseCos1 = 0.0;
#endif

            [unroll]
            for (float step = 0; step < STEPS_PER_SLICE; step++)
            {
                float stepBaseNoise = (slice + step * STEPS_PER_SLICE) * 0.6180339887498948482;
                float stepNoise = frac(noiseSample + stepBaseNoise);
                
                float s = (step + stepNoise) / STEPS_PER_SLICE;
                s = pow(s, sampleDistributionPower);
                s += minS;

                float2 sampleOffset = s * omega; //In pixel coord;
                float sampleOffsetLength = length(sampleOffset);
                sampleOffset = round(sampleOffset) * _TextureSize.zw; //To UV coord
                
                float2 sampleScreenPos0 = uv + sampleOffset;
                float sampleLinearDepth0 = LinearEyeDepth(_DepthTexture.SampleLevel(sampler_PointClamp, sampleScreenPos0, 0), _ZBufferParams);
                float3 samplePos0 = GetViewSpacePositionFromLinearDepth(sampleScreenPos0, sampleLinearDepth0);
                
                float2 sampleScreenPos1 = uv - sampleOffset;
                float sampleLinearDepth1 = LinearEyeDepth(_DepthTexture.SampleLevel(sampler_PointClamp, sampleScreenPos1, 0), _ZBufferParams);
                float3 samplePos1 = GetViewSpacePositionFromLinearDepth(sampleScreenPos1, sampleLinearDepth1);
                
                float3 sampleDelta0 = samplePos0 - pixCenterPos;
                float3 sampleDelta1 = samplePos1 - pixCenterPos;

                float sampleDist0 = length(sampleDelta0);
                float sampleDist1 = length(sampleDelta1);

                //Normalize
                float3 sampleHorizonVec0 = sampleDelta0 / sampleDist0;
                float3 sampleHorizonVec1 = sampleDelta1 / sampleDist1;

                float weight0 = saturate(sampleDist0 * falloffMul + falloffAdd);
                float weight1 = saturate(sampleDist1 * falloffMul + falloffAdd);

                //sample horizon cos
                float shc0 = dot(sampleHorizonVec0, viewVec);
                float shc1 = dot(sampleHorizonVec1, viewVec);

                shc0 = lerp(lowHorizonCos0, shc0, weight0);
                shc1 = lerp(lowHorizonCos1, shc1, weight1);

#if USE_AVERAGE_COS
                baseCos0 += shc0;
                baseCos1 += shc1;
#else
                horizonCos0 = max(horizonCos0, shc0);
                horizonCos1 = max(horizonCos1, shc1);
#endif
            }

#if USE_AVERAGE_COS
            baseCos0 /= STEPS_PER_SLICE;
            baseCos1 /= STEPS_PER_SLICE;

            horizonCos0 = max(baseCos0, horizonCos0);
            horizonCos1 = max(baseCos1, horizonCos1);
#endif
            float h0 = acos(horizonCos0);
            float h1 = -acos(horizonCos1);

            h0 = n + clamp(h0 - n, -HALF_PI, HALF_PI);
            h1 = n + clamp(h1 - n, -HALF_PI, HALF_PI);

            float val0 = (cosNorm + 2 * h0 * sin(n) - cos(2 * h0 - n)) / 4;
            float val1 = (cosNorm + 2 * h1 * sin(n) - cos(2 * h1 - n)) / 4;
            visibility += projectedNormalVecLength * (val0 + val1);
        }

        visibility /= SLICE_COUNT;
        visibility = max(0.03, visibility);
    }

    float outputColor = visibility;
#if USE_AVERAGE_COS
    outputColor /= HALF_PI;
#endif
    //endregion

    _RW_GTAOTexture[dispatchThreadID.xy] = float4(outputColor, normalVS.xy * 0.5f + 0.5f, depthC);
}

[numthreads(BLUR_THREAD_GROUP_SIZE,1,1)]
void BlurHorizontalMain(uint3 groupID : SV_GroupID,
                uint3 groupThreadID : SV_GroupThreadID,
                uint groupIndex : SV_GroupIndex,
                uint3 dispatchThreadID : SV_DispatchThreadID)
{
    uint sqrDownSamplingFactor = _DownsamplingFactor * _DownsamplingFactor;
    int subpixelIndex = _FrameIndex % sqrDownSamplingFactor;
    int2 subpixelBias = int2(subpixelIndex % _DownsamplingFactor, subpixelIndex / _DownsamplingFactor);
    int2 pixelCoord = int2(dispatchThreadID.x / _DownsamplingFactor, dispatchThreadID.y);
    int2 thisSubpixelBias = int2(dispatchThreadID.x % _DownsamplingFactor, 0);

    //region Cache AO Normal Depth
    int2 groupCacheStartPos = groupID.xy * int2(BLUR_THREAD_GROUP_SIZE, 1) / int2(_DownsamplingFactor, 1) - int2(BLUR_RADIUS+BILINEAR_RADIUS, 0);
    int cacheIndex = groupIndex * 2;
    if(cacheIndex < int(CACHED_AO_NORMAL_DEPTH_FOR_BLUR_SIZE-1))
    {
        CacheAONormalDepthForBlur(groupCacheStartPos, cacheIndex, 0);
        CacheAONormalDepthForBlur(groupCacheStartPos, cacheIndex + 1, 0);
    }
    GroupMemoryBarrierWithGroupSync();
    //endregion

    float4 thisAONormalDepth = GetAONormalDepthForBlur(groupIndex  / _DownsamplingFactor);

    //region Blur AO
    int cacheAOIndex = groupIndex * 2;
    if(cacheAOIndex < CACHED_AO_FOR_BILINEAR_SIZE-1)
    {
        CacheGaussianBlur(cacheAOIndex, 0);
        CacheGaussianBlur(cacheAOIndex + 1, 0);
    }
    GroupMemoryBarrierWithGroupSync();
    //endregion

    //region Bilinear Sampling
    uint loadIndex = groupIndex;
    int threadPos = loadIndex / _DownsamplingFactor;
    float2 signVal = sign(thisSubpixelBias - subpixelBias);
    float thisAO    = GetAOForBilinear(threadPos);
    float leftAO    = GetAOForBilinear(threadPos + signVal.x);
    float2 lerpVal = abs(thisSubpixelBias - subpixelBias) / (float)_DownsamplingFactor;
    float finalAO = lerp(thisAO, leftAO, lerpVal.x);
    //endregion

    _RW_BlurTexture[dispatchThreadID.xy] = float4(finalAO, thisAONormalDepth.yzw);
}

[numthreads(1,BLUR_THREAD_GROUP_SIZE,1)]
void BlurVerticalMain(uint3 groupID : SV_GroupID,
                uint3 groupThreadID : SV_GroupThreadID,
                uint groupIndex : SV_GroupIndex,
                uint3 dispatchThreadID : SV_DispatchThreadID)
{
    uint sqrDownSamplingFactor = _DownsamplingFactor * _DownsamplingFactor;
    int subpixelIndex = _FrameIndex % sqrDownSamplingFactor;
    int2 subpixelBias = int2(subpixelIndex % _DownsamplingFactor, subpixelIndex / _DownsamplingFactor);
    int2 pixelCoord = dispatchThreadID.xy / _DownsamplingFactor;
    int2 thisSubpixelBias = dispatchThreadID.xy % _DownsamplingFactor;

    //region Cache AO Normal Depth
    int2 groupCacheStartPos = groupID.xy * int2(1, BLUR_THREAD_GROUP_SIZE) / int2(1, _DownsamplingFactor) - int2(0, BLUR_RADIUS+BILINEAR_RADIUS);
    int cacheIndex = groupIndex * 2;
    if(cacheIndex < int(CACHED_AO_NORMAL_DEPTH_FOR_BLUR_SIZE-1))
    {
        CacheAONormalDepthForBlur(groupCacheStartPos, cacheIndex, 1);
        CacheAONormalDepthForBlur(groupCacheStartPos, cacheIndex + 1, 1);
    }
    GroupMemoryBarrierWithGroupSync();
    //endregion

    //region Blur AO
    int cacheAOIndex = groupIndex * 2;
    if(cacheAOIndex < CACHED_AO_FOR_BILINEAR_SIZE-1)
    {
        CacheGaussianBlur(cacheAOIndex, 1);
        CacheGaussianBlur(cacheAOIndex + 1, 1);
    }
    GroupMemoryBarrierWithGroupSync();
    //endregion

    //region Bilinear Sampling
    uint loadIndex = groupIndex;
    int threadPos = loadIndex / _DownsamplingFactor;
    float2 signVal = sign(thisSubpixelBias - subpixelBias);
    float thisAO    = GetAOForBilinear(threadPos);
    float topAO    = GetAOForBilinear(threadPos + signVal.y);
    float2 lerpVal = abs(thisSubpixelBias - subpixelBias) / (float)_DownsamplingFactor;
    float finalAO = lerp(thisAO, topAO, lerpVal.y); 
    //endregion

    _RW_BlurTexture[dispatchThreadID.xy] = finalAO;
}

[numthreads(GTAO_THREAD_GROUP_SIZE_X,GTAO_THREAD_GROUP_SIZE_Y,1)]
void VisualizeMain(uint3 groupID : SV_GroupID,
                uint3 groupThreadID : SV_GroupThreadID,
                uint groupIndex : SV_GroupIndex,
                uint3 dispatchThreadID : SV_DispatchThreadID)
{
    float gtaoVal = _GTAOTexture.Load(uint3(dispatchThreadID.xy, 0)).r;
    float4 colorTexture = _ColorTexture.Load(uint3(dispatchThreadID.xy, 0));
    float3 finalColor = colorTexture.rgb * lerp(1.0f, gtaoVal, _Intensity);
    _RW_VisualizeTexture[dispatchThreadID.xy] = float4(finalColor, 1.0f);
}

GroundTruthAmbientOcclusion.cs

用于控制GTAO的各种参数，没什么好说的。

using System;

namespace UnityEngine.Rendering.Universal
{
    [Serializable, VolumeComponentMenu("SSAO/GTAO")]
    public class GroundTruthAmbientOcclusion : VolumeComponent, IPostProcessComponent
    {
        public ClampedIntParameter downsamplingFactor = new ClampedIntParameter(2, 1, 4);

        public ClampedFloatParameter intensity = new ClampedFloatParameter(0.0f, 0.0f, 1.0f);
        public ClampedFloatParameter radius = new ClampedFloatParameter(1.0f, 0.01f, 5.0f);
        public ClampedFloatParameter distributionPower = new ClampedFloatParameter(2.0f, 1.0f, 5.0f);
        public ClampedFloatParameter falloffRange = new ClampedFloatParameter(0.1f, 0.01f, 1.0f);

        public bool IsActive()
        {
            return active && intensity.value > 0.0f;
        }

        public bool IsTileCompatible()
        {
            return false;
        }
    }
}

GTAORendererFeature.cs

也没啥好说的，很普通的RendererFeature。

namespace UnityEngine.Rendering.Universal
{
    public class GTAORendererFeature : ScriptableRendererFeature
    {
        [System.Serializable]
        public class GTAOSettings
        {
            public bool isEnabled;
            public RenderPassEvent renderPassEvent = RenderPassEvent.AfterRenderingOpaques;
            public ComputeShader gtaoComputeShader;
        }

        public GTAOSettings settings = new GTAOSettings();
        private GTAORenderPass gtaoRenderPass;
        public override void Create()
        {
            gtaoRenderPass = new GTAORenderPass(settings);
        }

        public override void AddRenderPasses(ScriptableRenderer renderer, ref RenderingData renderingData)
        {
            GroundTruthAmbientOcclusion gtao = VolumeManager.instance.stack.GetComponent<GroundTruthAmbientOcclusion>();
            if (gtao != null && gtao.IsActive())
            {
                gtaoRenderPass.Setup(gtao);
                renderer.EnqueuePass(gtaoRenderPass);
            }
        }
    }
}

GTAORenderPass.cs

主要值得注意的是图像分辨率的大小，获取半分辨率的大小时，要记得使用Ceil来获取更大的图片。然后每一个阶段使用的Dispatch数目也不尽相同，主要是上采样的阶段花样比较多。

namespace UnityEngine.Rendering.Universal
{
    public class GTAORenderPass : ScriptableRenderPass
    {
        private const string profilerTag = "Ground Truth Ambient Occlusion";
        private const string gtaoKernelName = "GTAOMain";
        private const string blurHorizontalKernelName = "BlurHorizontalMain";
        private const string blurVerticalKernelName = "BlurVerticalMain";
        private const string visualizeKernelName = "VisualizeMain";

        private ProfilingSampler profilingSampler;
        private ProfilingSampler gtaoSampler = new ProfilingSampler("GTAO Pass");
        private ProfilingSampler blurSampler = new ProfilingSampler("Blur Pass");
        private ProfilingSampler visualizeSampler = new ProfilingSampler("Visualize Pass");

        private RenderTargetHandle cameraColor;
        private RenderTargetIdentifier cameraColorIden;
        private RenderTargetHandle cameraDepth;
        private RenderTargetIdentifier cameraDepthIden;
        private RenderTargetHandle cameraDepthAttachment;
        private RenderTargetIdentifier cameraDepthAttachmentIden;

        private static readonly string gtaoTextureName = "_GTAOBuffer";
        private static readonly int gtaoTextureID = Shader.PropertyToID(gtaoTextureName);
        private RenderTargetHandle gtaoTextureHandle;
        private RenderTargetIdentifier gtaoTextureIden;

        private static readonly string horizontalBlurTextureName = "_HorizontalBlurBuffer";
        private static readonly int horizontalBlurTextureID = Shader.PropertyToID(horizontalBlurTextureName);
        private RenderTargetHandle horizontalBlurTextureHandle;
        private RenderTargetIdentifier horizontalBlurTextureIden;

        private static readonly string vericalBlurTextureName = "_VerticalBlurBuffer";
        private static readonly int vericalBlurTextureID = Shader.PropertyToID(vericalBlurTextureName);
        private RenderTargetHandle vericalBlurTextureHandle;
        private RenderTargetIdentifier vericalBlurTextureIden;

        private static readonly string visualizeTextureName = "_VisualizeBuffer";
        private static readonly int visualizeTextureID = Shader.PropertyToID(visualizeTextureName);
        private RenderTargetHandle visualizeTextureHandle;
        private RenderTargetIdentifier visualizeTextureIden;

        private GroundTruthAmbientOcclusion groundTruthAmbientOcclusion;
        private ComputeShader gtaoComputeShader;
        private GTAORendererFeature.GTAOSettings settings;

        private int downsamplingFactor;
        private Vector2Int fullRes;
        private Vector2Int downsampleRes;
        private int frameIndex;

        static readonly int _GTAOFrameIndexID = Shader.PropertyToID("_FrameIndex");
        static readonly int _GTAODownsamplingFactorID = Shader.PropertyToID("_DownsamplingFactor");
        static readonly int _GTAOIntensityID = Shader.PropertyToID("_Intensity");
        static readonly int _GTAOSampleRadiusID = Shader.PropertyToID("_SampleRadius");
        static readonly int _GTAODistributionPowerID = Shader.PropertyToID("_DistributionPower");
        static readonly int _GTAOFalloffRangeID = Shader.PropertyToID("_FalloffRange");

        static readonly int _GTAOTextureSizeID = Shader.PropertyToID("_TextureSize");
        static readonly int _GTAOColorTextureID = Shader.PropertyToID("_ColorTexture");
        static readonly int _GTAODepthTextureID = Shader.PropertyToID("_DepthTexture");
        static readonly int _GTAOTextureID = Shader.PropertyToID("_GTAOTexture");
        static readonly int _GTAORWTextureID = Shader.PropertyToID("_RW_GTAOTexture");
        static readonly int _GTAORWBlurTextureID = Shader.PropertyToID("_RW_BlurTexture");
        static readonly int _GTAORWVisualizeTextureID = Shader.PropertyToID("_RW_VisualizeTexture");

        public GTAORenderPass(GTAORendererFeature.GTAOSettings settings)
        {
            this.settings = settings;
            profilingSampler = new ProfilingSampler(profilerTag);
            renderPassEvent = settings.renderPassEvent;
            gtaoComputeShader = settings.gtaoComputeShader;

            cameraColor.Init("_CameraColorTexture");
            cameraColorIden = cameraColor.Identifier();
            cameraDepth.Init("_CameraDepthTexture");
            cameraDepthIden = cameraDepth.Identifier();
            cameraDepthAttachment.Init("_CameraDepthAttachment");
            cameraDepthAttachmentIden = cameraDepthAttachment.Identifier();

            gtaoTextureHandle.Init(gtaoTextureName);
            gtaoTextureIden = gtaoTextureHandle.Identifier();
            horizontalBlurTextureHandle.Init(horizontalBlurTextureName);
            horizontalBlurTextureIden = horizontalBlurTextureHandle.Identifier();
            vericalBlurTextureHandle.Init(vericalBlurTextureName);
            vericalBlurTextureIden = vericalBlurTextureHandle.Identifier();
            visualizeTextureHandle.Init(visualizeTextureName);
            visualizeTextureIden = visualizeTextureHandle.Identifier();

            frameIndex = 0;
        }

        public void Setup(GroundTruthAmbientOcclusion groundTruthAmbientOcclusion)
        {
            this.groundTruthAmbientOcclusion = groundTruthAmbientOcclusion;
        }

        public override void Configure(CommandBuffer cmd, RenderTextureDescriptor cameraTextureDescriptor)
        {
            RenderTextureDescriptor desc = cameraTextureDescriptor;
            desc.enableRandomWrite = true;
            desc.depthBufferBits = 0;
            desc.msaaSamples = 1;
            desc.graphicsFormat = Experimental.Rendering.GraphicsFormat.R16G16B16A16_SFloat;

            downsamplingFactor = groundTruthAmbientOcclusion.downsamplingFactor.value;
            fullRes = new Vector2Int(desc.width, desc.height);
            downsampleRes = new Vector2Int(Mathf.CeilToInt((float)desc.width / downsamplingFactor), Mathf.CeilToInt((float)desc.height / downsamplingFactor));

            cmd.GetTemporaryRT(visualizeTextureID, desc);
            cmd.GetTemporaryRT(vericalBlurTextureID, desc);
            desc.height = downsampleRes.y;
            cmd.GetTemporaryRT(horizontalBlurTextureID, desc);
            desc.width = downsampleRes.x;
            cmd.GetTemporaryRT(gtaoTextureID, desc);      
        }

        private void DoGTAOCalculation(CommandBuffer cmd, RenderTargetIdentifier depthid, RenderTargetIdentifier gtaoid, ComputeShader computeShader)
        {
            if (!computeShader.HasKernel(gtaoKernelName)) return;
            int gtaoKernel = computeShader.FindKernel(gtaoKernelName);

            computeShader.GetKernelThreadGroupSizes(gtaoKernel, out uint x, out uint y, out uint z);
            cmd.SetComputeIntParam(computeShader, _GTAOFrameIndexID, frameIndex);
            cmd.SetComputeIntParam(computeShader, _GTAODownsamplingFactorID, downsamplingFactor);
            cmd.SetComputeVectorParam(computeShader, _GTAOTextureSizeID, new Vector4(fullRes.x, fullRes.y, 1.0f / fullRes.x, 1.0f / fullRes.y));

            cmd.SetComputeFloatParam(computeShader, _GTAOSampleRadiusID, groundTruthAmbientOcclusion.radius.value);
            cmd.SetComputeFloatParam(computeShader, _GTAODistributionPowerID, groundTruthAmbientOcclusion.distributionPower.value);
            cmd.SetComputeFloatParam(computeShader, _GTAOFalloffRangeID, groundTruthAmbientOcclusion.falloffRange.value);

            cmd.SetComputeTextureParam(computeShader, gtaoKernel, _GTAODepthTextureID, depthid);
            cmd.SetComputeTextureParam(computeShader, gtaoKernel, _GTAORWTextureID, gtaoid);

            cmd.DispatchCompute(computeShader, gtaoKernel,
                    Mathf.CeilToInt((float)downsampleRes.x / x),
                    Mathf.CeilToInt((float)downsampleRes.y / y),
                    1);
        }

        private void DoBlur(CommandBuffer cmd, RenderTargetIdentifier gtaoid, RenderTargetIdentifier horizontalid, RenderTargetIdentifier verticalid, ComputeShader computeShader)
        {
            if (!computeShader.HasKernel(blurHorizontalKernelName) || !computeShader.HasKernel(blurVerticalKernelName)) return;
            int horizontalKernel = computeShader.FindKernel(blurHorizontalKernelName);
            int verticalKernel = computeShader.FindKernel(blurVerticalKernelName);

            uint x, y, z;
            computeShader.GetKernelThreadGroupSizes(horizontalKernel, out x, out y, out z);
            cmd.SetComputeTextureParam(computeShader, horizontalKernel, _GTAOTextureID, gtaoid);
            cmd.SetComputeTextureParam(computeShader, horizontalKernel, _GTAORWBlurTextureID, horizontalid);
            cmd.DispatchCompute(computeShader, horizontalKernel,
                                Mathf.CeilToInt((float)fullRes.x / x),
                                Mathf.CeilToInt((float)downsampleRes.y / y),
                                1);

            computeShader.GetKernelThreadGroupSizes(verticalKernel, out x, out y, out z);
            cmd.SetComputeTextureParam(computeShader, verticalKernel, _GTAOTextureID, horizontalid);
            cmd.SetComputeTextureParam(computeShader, verticalKernel, _GTAORWBlurTextureID, verticalid);
            cmd.DispatchCompute(computeShader, verticalKernel,
                                Mathf.CeilToInt((float)fullRes.x / x),
                                Mathf.CeilToInt((float)fullRes.y / y),
                                1);
        }

        private void DoVisualization(CommandBuffer cmd, RenderTargetIdentifier colorid, RenderTargetIdentifier verticalid, RenderTargetIdentifier visualizeid, ComputeShader computeShader)
        {
            if (!computeShader.HasKernel(visualizeKernelName)) return;
            int visualzieKernel = computeShader.FindKernel(visualizeKernelName);
            cmd.SetComputeFloatParam(computeShader, _GTAOIntensityID, groundTruthAmbientOcclusion.intensity.value);

            computeShader.GetKernelThreadGroupSizes(visualzieKernel, out uint x, out uint y, out uint z);
            cmd.SetComputeTextureParam(computeShader, visualzieKernel, _GTAOColorTextureID, colorid);
            cmd.SetComputeTextureParam(computeShader, visualzieKernel, _GTAOTextureID, verticalid);
            cmd.SetComputeTextureParam(computeShader, visualzieKernel, _GTAORWVisualizeTextureID, visualizeid);
            cmd.DispatchCompute(computeShader, visualzieKernel,
                                Mathf.CeilToInt((float)fullRes.x / x),
                                Mathf.CeilToInt((float)fullRes.y / y),
                                1);

            cmd.Blit(visualizeid, colorid);
        }

        public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
        {
            CommandBuffer cmd = CommandBufferPool.Get(profilerTag);
            context.ExecuteCommandBuffer(cmd);
            cmd.Clear();

            using (new ProfilingScope(cmd, profilingSampler))
            {
                using (new ProfilingScope(cmd, gtaoSampler))
                {
                    if(renderingData.cameraData.isSceneViewCamera)
                    {
                        DoGTAOCalculation(cmd, cameraDepthIden, gtaoTextureIden, gtaoComputeShader);
                    }
                    else
                    {
                        DoGTAOCalculation(cmd, cameraDepthAttachmentIden, gtaoTextureIden, gtaoComputeShader);
                    }
                }

                using (new ProfilingScope(cmd, blurSampler))
                {
                    DoBlur(cmd, gtaoTextureIden, horizontalBlurTextureIden, vericalBlurTextureIden, gtaoComputeShader);
                }

                using (new ProfilingScope(cmd, visualizeSampler))
                {
                    DoVisualization(cmd, cameraColorIden, vericalBlurTextureIden, visualizeTextureIden, gtaoComputeShader);
                }
            }

            frameIndex=(++frameIndex)%60;

            context.ExecuteCommandBuffer(cmd);
            cmd.Clear();
            CommandBufferPool.Release(cmd);
        }

        public override void FrameCleanup(CommandBuffer cmd)
        {
            cmd.ReleaseTemporaryRT(gtaoTextureID);
            cmd.ReleaseTemporaryRT(horizontalBlurTextureID);
            cmd.ReleaseTemporaryRT(vericalBlurTextureID);
            cmd.ReleaseTemporaryRT(visualizeTextureID);
        }
    }
}

后记

又隔了很久，总算逼着自己把这篇文章写完了，也逼着自己强行用Group Shared Memory来做各种采样的优化。写出来的代码果然很吓人也应该没人能看得懂吧（当然更可能是没人会看）。

环境光遮蔽#

具体的操作#

相关代码和说明#

GTAOComputeShader.compute#

GroundTruthAmbientOcclusion.cs#

GTAORendererFeature.cs#

GTAORenderPass.cs#

后记#