将近两年之后再回过头来制作高斯模糊
虽然两年前已经写过了使用Group Shared Memory加速高斯模糊这篇文章了。但当时写的时候仍有一些遗憾的地方,由于使用的是长度为17的静态的高斯模糊的数组(实际上只有9个权重),虽然在一定程度上能够达到任意调节高斯模糊的程度的效果,但在较低程度的高斯模糊时,是通过手动线性插值找到合适的采样颜色,且一定会有17次的颜色和权重的运算;而在较高程度的高斯模糊时,由于仅有十七个有效的颜色点,会有明显的采样次数不足的瑕疵。
而这两年之间我也曾考虑使用不同的方法来制作一个既能满足很高程度的高斯模糊,又能兼顾很小程度的高斯模糊,性能上也相对高效,且使用同一套通用的代码,的高斯模糊效果。下面便是我之前在Shadertoy上写的通过随机采样和历史混合的高斯模糊效果。
但随机带来的噪声和历史混合带来的限制,决定了这种方法终究不能真正地使用在项目中,于是我又开始回到了使用Compute Shader和Group Shared Memory来计算高斯模糊效果的老路子上。不同的是,这次我使用了Compute Buffer把高斯模糊的参数传给Shader,这样就能确保范围内的每一个采样点都能够对最后的颜色产生应有的贡献。
正态分布(Normal Distribution)
和之前不同的是,这次我们要先从正态分布入手,从正态分布的特性来考虑我们的计算方式。正态分布的概率密度函数(probability density function)如下所示: $$ f(x) = \frac 1 {\sigma \sqrt{2 \pi}} e^{- \frac 1 2 (\frac {x-\mu} \sigma)^2} $$ 使用正态分布对信号进行过滤,被称作高斯滤波器(Gaussian Filter)。我们在使用的时候会把\(\mu\)设成0,这样永远是最中心的信号带来最大的贡献。但是这个概率密度函数的\(x\)的范围是\((-\infin, \infin)\),我们不可能对所有的信号都进行采样,于是我们一般对\(3\sigma\)范围内的信号进行采样,对1D的正态分布来说,\((-3\sigma, 3\sigma)\)占据了约99.7%的面积。因此我们往往使用三倍的\(sigma\)作为采样的半径,事实上在2D的时候,可能需要更大的采样半径才能消除明显的采样半径过小的瑕疵。
有一点值得一提的是,虽然我并不会具体的微积分的计算,但据我所知先后执行两个\(\sigma\)值分别为\(x\)和\(y\)高斯模糊,等价于执行一次\(\sigma\)值为\(\sqrt {x^2+y^2}\)的高斯模糊。
另一个有趣的点是,在普通的模糊操作是我们往往会用降采样再升采样的方式来减少采样的次数。对于半分辨率的线性1D降采样和升采样,中心像素保留了\(\frac 3 8\)的之前像素的信息,我们可以找到那么一个\(\sigma\)的值使得其在\((-0.5, 0.5)\)之间的面积约等于\(\frac 3 8\),这样我们就能说我们通过线性降采样和升采样做到了近似对应\(\sigma\)的高斯模糊的效果。可惜这个\(\sigma\)不太好算,有Group Shared Memory也没有必要去做额外的降采样和升采样了。
在本文中,会通过横竖两个1D高斯滤波器来等效一个2D的高斯滤波器,使用Group Shared Memory的话,倒是一个2D的高斯滤波器效率更高一些,不过为了后续的扩展性,本文拆成了两个滤波器。
具体的实现方法
剩下的就和之前大同小异了,为了确保每个像素只会进行至多两次采样,需要限制高斯模糊的最大半径GAUSSIAN_BLUR_MAX_RADIUS
为THREAD_GROUP_SIZE
的一半。而为了2D的高斯模糊在比较极端的情况下也能有比较好的效果,我的高斯模糊的半径会是\(\sigma\)的3.8倍向上取整。
GaussianBlurComputeShader.compute
这是一个横竖两次高斯模糊的Compute Shader,通过Group Shared Memory优化了原本高斯模糊的每个像素的采样操作(至多两次)。最大模糊半径为128个像素。
#pragma kernel GaussianH
#pragma kernel GaussianV
Texture2D<float4> _SourceTex;
RWTexture2D<float4> _RW_TargetTex;
StructuredBuffer<float> _GaussianWeights;
float4 _TextureSize;
#define GAUSSIAN_BLUR_MAX_RADIUS 128
#define THREAD_GROUP_SIZE 256
const static int CACHED_COLOR_SIZE = THREAD_GROUP_SIZE +GAUSSIAN_BLUR_MAX_RADIUS*2;
groupshared half3 cachedColor[CACHED_COLOR_SIZE];
void SetCachedColor(half3 color, int index) { cachedColor[index] = color; }
half3 GetCachedColor(int threadPos) { return cachedColor[threadPos + GAUSSIAN_BLUR_MAX_RADIUS]; }
void CacheColor(int2 groupCacheStartPos, int cacheIndex, int isHorizontal)
{
int2 texturePos = groupCacheStartPos + cacheIndex * int2(isHorizontal, 1 - isHorizontal);
texturePos = clamp(texturePos, 0, _TextureSize.xy - 1.0f);
half3 color = _SourceTex.Load(uint3(texturePos, 0)).rgb;
SetCachedColor(color, cacheIndex);
}
half3 Gaussian(uint3 groupID, uint3 groupThreadID, uint groupIndex, uint3 dispatchThreadID, int isHorizontal)
{
int2 direction = int2(isHorizontal, 1 - isHorizontal);
int2 theadGroupSize = (THREAD_GROUP_SIZE - 1) * direction + 1;
int2 groupCacheStartPos = groupID.xy * theadGroupSize - GAUSSIAN_BLUR_MAX_RADIUS * direction;
int cacheIndex = groupIndex * 2;
if (cacheIndex < CACHED_COLOR_SIZE - 1)
{
CacheColor(groupCacheStartPos, cacheIndex, isHorizontal);
CacheColor(groupCacheStartPos, cacheIndex+1, isHorizontal);
}
GroupMemoryBarrierWithGroupSync();
int sampleRadius = int(_GaussianWeights[0]);
uint loadCacheIndex = groupIndex;
int threadPos = loadCacheIndex;
half3 sumColor = 0.0f;
half sumWeight = 0.0f;
for (int i=-sampleRadius; i<=sampleRadius; ++i)
{
half3 color = GetCachedColor(threadPos + i);
half weight = _GaussianWeights[abs(i)+1];
sumColor += color * weight;
sumWeight += weight;
}
return sumColor / sumWeight;
}
[numthreads(THREAD_GROUP_SIZE,1,1)]
void GaussianH(uint3 groupID : SV_GroupID,
uint3 groupThreadID : SV_GroupThreadID,
uint groupIndex : SV_GroupIndex,
uint3 dispatchThreadID : SV_DispatchThreadID)
{
half3 color = Gaussian(groupID, groupThreadID, groupIndex, dispatchThreadID, 1);
_RW_TargetTex[dispatchThreadID.xy] = half4(color, 1.0f);
}
[numthreads(1, THREAD_GROUP_SIZE,1)]
void GaussianV(uint3 groupID : SV_GroupID,
uint3 groupThreadID : SV_GroupThreadID,
uint groupIndex : SV_GroupIndex,
uint3 dispatchThreadID : SV_DispatchThreadID)
{
half3 color = Gaussian(groupID, groupThreadID, groupIndex, dispatchThreadID, 0);
_RW_TargetTex[dispatchThreadID.xy] = half4(color, 1.0f);
}
GausianBlur.cs
这里设置了采样半径为\(\sigma\)的3.8倍,即使在有很亮的光斑的情况下,也能有很好的高斯模糊的效果(不好的话就再提高sigma的大小)。会通过\(\sigma\)计算出采样半径和每个像素的权重。
using System;
namespace UnityEngine.Rendering.Universal
{
[Serializable, VolumeComponentMenuForRenderPipeline("Post-processing/Gaussian Blur", typeof(UniversalRenderPipeline))]
public sealed class GaussianBlur : VolumeComponent, IPostProcessComponent
{
static float sigmaRadiusRatio = 3.8f;
public BoolParameter isEnabled = new BoolParameter(false);
public ClampedFloatParameter sigma = new ClampedFloatParameter(0.0f, 0.0f, Mathf.Floor(128.0f / sigmaRadiusRatio));
public bool IsActive()
{
return isEnabled.value && sigma.value > 0.0f;
}
public bool IsTileCompatible()
{
return false;
}
private static float INV_SQRT_2PI = 0.3989422804f;
private static float Gaussian(float sigma, float x)
{
float invSigma = 1.0f / sigma;
return INV_SQRT_2PI * invSigma * Mathf.Exp(-0.5f * x * x * invSigma * invSigma);
}
public static int SigmaToRadius(float sigma)
{
return Mathf.CeilToInt(sigma * sigmaRadiusRatio);
}
public static float[] GetGaussianWeights(float sigma)
{
int length = SigmaToRadius(sigma);
float[] weights = new float[length+1];
weights[0] = (float)length;
for (int i = 0; i < length; i++)
{
weights[i+1] = Gaussian(sigma, (float)i);
}
return weights;
}
}
}
GaussianBlurRenderPass.cs
这里我谷歌了一下,找到了一个比较合适的避免Unity说我没有释放Compute Buffer的办法。很朴实无华的横竖两次高斯模糊,如果Camera Color Attachment能够支持随机读写的话,就能再节省一次Blit。我使用的是Unity 2021.3.19f1c1,Unity很奇怪地给Camera Color Attachment命名为“_CameraColorAttachementA”,不过没什么大碍。
namespace UnityEngine.Rendering.Universal
{
public class GaussianBlurRenderPass : ScriptableRenderPass
{
static readonly string passName = "Gaussian Blur Render Pass";
private GaussianBlurRendererFeature.GaussianBlurSettings settings;
private GaussianBlur gaussianBlur;
private ComputeShader computeShader;
static readonly string cameraColorTextureName = "_CameraColorAttachmentA";
static readonly int cameraColorTextureID = Shader.PropertyToID(cameraColorTextureName);
RenderTargetIdentifier cameraColorIden;
static readonly string gaussianBlurTextureOneName = "_GaussianBlurTextureOne";
static readonly int gaussianBlurTextureOneID = Shader.PropertyToID(gaussianBlurTextureOneName);
RenderTargetIdentifier gaussianBlurTextureOneIden;
static readonly string gaussianBlurTextureTwoName = "_GaussianBlurTextureTwo";
static readonly int gaussianBlurTextureTwoID = Shader.PropertyToID(gaussianBlurTextureTwoName);
RenderTargetIdentifier gaussianBlurTextureTwoIden;
private ComputeBuffer computeBuffer;
private Vector2Int textureSize;
private float[] weights;
static readonly string HorizontalKernelName = "GaussianH";
static readonly string VerticalKernelName = "GaussianV";
static readonly int _SourceTex = Shader.PropertyToID("_SourceTex");
static readonly int _RW_TargetTex = Shader.PropertyToID("_RW_TargetTex");
static readonly int _GaussianWeights = Shader.PropertyToID("_GaussianWeights");
static readonly int _TextureSize = Shader.PropertyToID("_TextureSize");
public GaussianBlurRenderPass(GaussianBlurRendererFeature.GaussianBlurSettings settings)
{
profilingSampler = new ProfilingSampler(passName);
this.settings = settings;
renderPassEvent = settings.renderPassEvent;
computeShader = settings.computeShader;
cameraColorIden = new RenderTargetIdentifier(cameraColorTextureID);
gaussianBlurTextureOneIden = new RenderTargetIdentifier(gaussianBlurTextureOneID);
gaussianBlurTextureTwoIden = new RenderTargetIdentifier(gaussianBlurTextureTwoID);
}
public void Setup(GaussianBlur gaussianBlur)
{
this.gaussianBlur = gaussianBlur;
}
private void EnsureComputeBuffer(int count, int stride)
{
if(computeBuffer == null || computeBuffer.count != count || computeBuffer.stride != stride)
{
if(computeBuffer != null)
{
computeBuffer.Release();
}
computeBuffer = new ComputeBuffer(count, stride, ComputeBufferType.Structured);
}
}
public override void OnCameraSetup(CommandBuffer cmd, ref RenderingData renderingData)
{
weights = GaussianBlur.GetGaussianWeights(gaussianBlur.sigma.value);
int count = weights.Length;
EnsureComputeBuffer(count, 4);
computeBuffer.SetData(weights);
}
public override void Configure(CommandBuffer cmd, RenderTextureDescriptor cameraTextureDescriptor)
{
textureSize = new Vector2Int(cameraTextureDescriptor.width, cameraTextureDescriptor.height);
RenderTextureDescriptor desc = cameraTextureDescriptor;
desc.enableRandomWrite = true;
desc.msaaSamples = 1;
desc.depthBufferBits = 0;
cmd.GetTemporaryRT(gaussianBlurTextureOneID, desc);
cmd.GetTemporaryRT(gaussianBlurTextureTwoID, desc);
}
private Vector4 GetTextureSizeParams(Vector2Int size)
{
return new Vector4(size.x, size.y, 1.0f / size.x, 1.0f / size.y);
}
private void DoGaussianBlur(CommandBuffer cmd, RenderTargetIdentifier colorid,
RenderTargetIdentifier oneid, RenderTargetIdentifier twoid,
ComputeShader computeShader)
{
if (!computeShader) return;
{
int kernelID = computeShader.FindKernel(HorizontalKernelName);
computeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
cmd.SetComputeTextureParam(computeShader, kernelID, _SourceTex, colorid);
cmd.SetComputeTextureParam(computeShader, kernelID, _RW_TargetTex, oneid);
cmd.SetComputeBufferParam(computeShader, kernelID, _GaussianWeights, computeBuffer);
cmd.SetComputeVectorParam(computeShader, _TextureSize, GetTextureSizeParams(textureSize));
cmd.DispatchCompute(computeShader, kernelID,
Mathf.CeilToInt((float)textureSize.x / x),
Mathf.CeilToInt((float)textureSize.y / y),
1);
}
{
int kernelID = computeShader.FindKernel(VerticalKernelName);
computeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
cmd.SetComputeTextureParam(computeShader, kernelID, _SourceTex, oneid);
cmd.SetComputeTextureParam(computeShader, kernelID, _RW_TargetTex, twoid);
cmd.SetComputeBufferParam(computeShader, kernelID, _GaussianWeights, computeBuffer);
cmd.SetComputeVectorParam(computeShader, _TextureSize, GetTextureSizeParams(textureSize));
cmd.DispatchCompute(computeShader, kernelID,
Mathf.CeilToInt((float)textureSize.x / x),
Mathf.CeilToInt((float)textureSize.y / y),
1);
}
cmd.Blit(twoid, colorid);
}
public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
{
CommandBuffer cmd = CommandBufferPool.Get();
using (new ProfilingScope(cmd, profilingSampler))
{
DoGaussianBlur(cmd, cameraColorIden, gaussianBlurTextureOneIden, gaussianBlurTextureTwoIden, computeShader);
}
context.ExecuteCommandBuffer(cmd);
CommandBufferPool.Release(cmd);
}
public override void FrameCleanup(CommandBuffer cmd)
{
cmd.ReleaseTemporaryRT(gaussianBlurTextureOneID);
cmd.ReleaseTemporaryRT(gaussianBlurTextureTwoID);
}
public void Dispose()
{
if (computeBuffer != null)
{
computeBuffer.Release();
computeBuffer = null;
}
}
}
}
GaussianBlurRendererFeature.cs
没啥好说的了,加了一个Dispose
方法来及时释放Compute Buffer。
namespace UnityEngine.Rendering.Universal
{
public class GaussianBlurRendererFeature : ScriptableRendererFeature
{
[System.Serializable]
public class GaussianBlurSettings
{
public RenderPassEvent renderPassEvent;
public ComputeShader computeShader;
}
public GaussianBlurSettings settings = new GaussianBlurSettings();
private GaussianBlurRenderPass gaussianBlurRenderPass;
public override void Create()
{
gaussianBlurRenderPass = new GaussianBlurRenderPass(settings);
}
public override void AddRenderPasses(ScriptableRenderer renderer, ref RenderingData renderingData)
{
GaussianBlur gaussianBlur = VolumeManager.instance.stack.GetComponent<GaussianBlur>();
if(gaussianBlur != null && gaussianBlur.IsActive())
{
gaussianBlurRenderPass.Setup(gaussianBlur);
renderer.EnqueuePass(gaussianBlurRenderPass);
}
}
protected override void Dispose(bool disposing)
{
gaussianBlurRenderPass.Dispose();
base.Dispose(disposing);
}
}
}
后记
2023年了怎么还在做高斯模糊啊喂,明明都写了不知道多少遍了。下一个目标是EA之前做过演讲的Circular Blur(虽然也写了很多遍固定大小的了)。什么时候才能重拾勇气去算景深呢呜呜呜。