几乎连续的双Kawase模糊

2023年6月3日修订

发现还是在降采样升采样后进行线性插值来获取中间程度的模糊效果比较好，所以后面的代码也都改过来了，不过理论上的部分倒是没必要改。顺便也花了点时间写了一个Shadertoy作为演示：

Dual Kawase Blur

写这篇文章的原因

网上已经有了很多很多的双Kawase模糊的现成的案例，但是存在以下几个问题：1. 绝大部分的文章都只给了代码，没有相应的解释，至多会给一张直接从Arm的pdf截取的图示，而这张图示画了一堆方框和符号，却没有说明这些图案代表的含义。2. 绝大部分的文章通过修改采样的距离来控制模糊的程度，这个的缺点我们后续再谈。3. 绝大部分的文章并没有考虑模糊程度从0开始逐渐增大的动态过程，使用降采样和升采样往往会破坏整个画面的连续性。

如果只是想要获得一个模糊的画面，只需要做几次降采样和升采样就能完成了，但我希望能有一个连续地逐渐地变模糊的过程，因此我开始了量化双Kawase模糊的想法。

双Kawase模糊(Dual Kawase Blur)

双Kawase模糊是2015年Arm在Kawase模糊的基础上提出的一种通过降采样和升采样来快速且高效地进行高质量大半径模糊的一种方法，具体的pdf可以从这里找到。

Dual Kawase Blur Diagram

这里是一张双Kawase模糊的图示，表示了双Kawase模糊在降采样和升采样时的操作。细的黑线对应的格子是原始的像素（或是升采样后的像素），粗的黑线对应的格子是降采样后的像素。叉对应的是当前模糊的像素，圆对应的是当前模糊的像素所需要采样点。粉色对应的是降采样时的模糊的像素和采样点，绿色对应的是升采样时的模糊的像素和采样点。

从这张图中也可以看到双Kawase模糊利用双线性采样来节省采样数的操作。在降采样时实际采样了当前像素周围一共十六个像素的颜色；在升采样时实际采样了当前像素周围一共十三个像素的颜色。而如果在做降采样时，对于奇数个像素除以二向下取整，或者是在降采样时使用了不恰当的偏移（比如1.5倍的偏移），会导致降采样的采样点落在原始像素的中心，这时即使使用了双线性采样，也只等价于采样一个像素。

因此为了让每一个像素都能对模糊做到应用的贡献，为了达到比较好的模糊效果，我们这里限制双Kawase模糊的采样偏移为一倍（也就是严格按照采样点进行最优的双线性采样）。而通过多次降采样和升采样达到合适的模糊半径。

量化双Kawase模糊

降采样和升采样有一个缺点，就是只要发生了降采样和升采样，就必然会带来模糊。这时有两种方法，一种是在原始分辨率下通过消耗更大的方式进行加权模糊来逼近双Kawase模糊配合降采样带来的模糊；另一种是在零次和一次双Kawase之间线性插值得到一张介于两者之间模糊程度的图像。综合两者来看，线性插值得到的效果更为平滑，效果上稍“错误”一些，但完全在可接受的范围内。

我这边写了一个小小的脚本，去计算原始分辨率下值为1的像素点，在经过一次双Kawase模糊后，其他像素的值。通过多项式拟合这些模糊后的值，就能利用这些值来逼近双Kawase模糊的效果了。我这边对8x8的像素做了计算（实际上模糊的核心应该更大一些，不过我懒得改之前的代码了）。计算出的权重如下：

0.0003255208	0.001464844	0.003092448	0.004231771	0.004231771	0.003092448	0.001464844	0.0003255208	
0.001464844	0.004882813	0.009440104	0.01204427	0.01074219	0.007486979	0.004231771	0.001464844	
0.004394531	0.01334635	0.02311198	0.02701823	0.0218099	0.01334635	0.007486979	0.003092448	
0.01009115	0.02571615	0.03808594	0.04329427	0.03678386	0.0218099	0.01074219	0.004231771	
0.01529948	0.03222656	0.04069011	0.04589844	0.04329427	0.02701823	0.01204427	0.004231771	
0.01416016	0.0296224	0.03678386	0.04069011	0.03808594	0.02311198	0.009440104	0.003092448	
0.007324219	0.01985677	0.0296224	0.03222656	0.02571615	0.01334635	0.004882813	0.001464844	
0.001627604	0.007324219	0.01416016	0.01529948	0.01009115	0.004394531	0.001464844	0.0003255208

然后我把它丢进了Excel强行进行了一波运算并手动调整了一下，得出了下面这个拟合的公式（x范围大致在[-4, 4]之间，值并未归一化）：

// Approximate dual kawase blur with 4th degree polynomial.
// -3.5 <= x <= 3.5 (idealy)
float getWeight(float x)
{
	return 0.1356f * x * x * x * x - 0.06748 * x * x * x - 4.693656 * x * x + 0.9954208 * x + 45.57338;
}

具体的操作

接下来我们就能根据拟合的公式来进行几乎连续的双Kawase模糊了。通过计算以二为底的对数，可以知道我们需要进行多少次降采样，而其小数部分则代表着进行逼近下一次双Kawase模糊的程度，对应模糊的offset值。

DualKawaseBlurComputeShader.compute

这里就懒得对双Kawase模糊那一部分写group shared memory的优化了。THREAD_GROUP_SIZE需要是BLUR_RADIUS的四倍以上，不然缓存的时候每个像素需要采更多的样。BLUR_RADIUS实际上是4但这边写了5，是因为需要额外预留一个像素进行手动的双线性采样。

#pragma kernel KawaseDownSample
#pragma kernel KawaseUpSample
#pragma kernel KawaseLinear

Texture2D<float4> _SourceTexture;
RWTexture2D<float4> _RW_TargetTexture;
SamplerState sampler_LinearClamp;
float4 _SourceSize;
float4 _TargetSize;
float _Offset;

float3 sampleSource(float2 center, float2 offset)
{
	return _SourceTexture.SampleLevel(sampler_LinearClamp, center + offset * _Offset, 0.0f).rgb;
}

[numthreads(8,8,1)]
void KawaseDownSample(uint3 id : SV_DispatchThreadID)
{
	float2 uv = (float2(id.xy) + 0.5f) * _TargetSize.zw;
	float2 halfPixel = 0.5f * _TargetSize.zw;

	float3 c = sampleSource(uv, float2(0.0f, 0.0f));
	float3 tl = sampleSource(uv, halfPixel * float2(-1.0f, +1.0f));
	float3 tr = sampleSource(uv, halfPixel * float2(+1.0f, +1.0f));
	float3 bl = sampleSource(uv, halfPixel * float2(-1.0f, -1.0f));
	float3 br = sampleSource(uv, halfPixel * float2(+1.0f, -1.0f));

	float3 color = (tl + tr + bl + br + c * 4.0f) / 8.0f;
	_RW_TargetTexture[id.xy] = float4(color, 1.0f);
}

[numthreads(8, 8, 1)]
void KawaseUpSample(uint3 id : SV_DispatchThreadID)
{
	float2 uv = (float2(id.xy) + 0.5f) * _TargetSize.zw;
	float2 onePixel = 1.0f * _TargetSize.zw;

	// float3 c = sampleSource(uv, float2(0.0f, 0.0f));
	float3 t2 = sampleSource(uv, onePixel * float2(+0.0f, +2.0f));
	float3 b2 = sampleSource(uv, onePixel * float2(+0.0f, -2.0f));
	float3 l2 = sampleSource(uv, onePixel * float2(-2.0f, +0.0f));
	float3 r2 = sampleSource(uv, onePixel * float2(+2.0f, +0.0f));
	float3 tl = sampleSource(uv, onePixel * float2(-1.0f, +1.0f));
	float3 tr = sampleSource(uv, onePixel * float2(+1.0f, +1.0f));
	float3 bl = sampleSource(uv, onePixel * float2(-1.0f, -1.0f));
	float3 br = sampleSource(uv, onePixel * float2(+1.0f, -1.0f));

	float3 color = (t2 + b2 + l2 + r2 + 2.0f * (tl + tr + bl + br)) / 12.0f;
	_RW_TargetTexture[id.xy] = float4(color, 1.0f);
}

[numthreads(8, 8, 1)]
void KawaseLinear(uint3 id : SV_DispatchThreadID)
{
	half3 sourceTex = _SourceTexture.Load(uint3(id.xy, 0)).rgb;
	half3 blurredTex = _RW_TargetTexture.Load(uint3(id.xy, 0)).rgb;
	half3 color = lerp(sourceTex, blurredTex, _Offset);
	_RW_TargetTexture[id.xy] = float4(color, 1.0f);
}

DualKawaseBlurRenderPass.cs

这里需要对最后一次降采样做一次拟合的操作，然后还要注意一次都不进行降采样时也需要进行拟合。使用的Unity版本是2021.3.19f1c1，URP版本是12.1.10，因此会有_CameraColorAttachmentA这样奇怪的名字。

using System.Collections.Generic;

namespace UnityEngine.Rendering.Universal
{
    public class DualKawaseBlurRenderPass : ScriptableRenderPass
    {
        static readonly string passName = "Circular Blur Render Pass";

        private DualKawaseBlurRendererFeature.DualKawaseBlurSettings settings;
        private DualKawaseBlur dualKawaseBlur;
        private ComputeShader computeShader;

        static readonly string cameraColorTextureName = "_CameraColorAttachmentA";
        static readonly int cameraColorTextureID = Shader.PropertyToID(cameraColorTextureName);
        private RenderTargetIdentifier cameraColorIden;

        private Vector2Int textureSize;
        private RenderTextureDescriptor desc;

        public DualKawaseBlurRenderPass(DualKawaseBlurRendererFeature.DualKawaseBlurSettings settings)
        {
            profilingSampler = new ProfilingSampler(passName);

            this.settings = settings;
            renderPassEvent = settings.renderPassEvent;
            computeShader = settings.computeShader;

            cameraColorIden = new RenderTargetIdentifier(cameraColorTextureID);
        }

        public void Setup(DualKawaseBlur dualKawaseBlur)
        {
            this.dualKawaseBlur = dualKawaseBlur;
        }

        public override void Configure(CommandBuffer cmd, RenderTextureDescriptor cameraTextureDescriptor)
        {
            textureSize = new Vector2Int(cameraTextureDescriptor.width, cameraTextureDescriptor.height);
            desc = cameraTextureDescriptor;
            desc.enableRandomWrite = true;
            desc.msaaSamples = 1;
            desc.depthBufferBits = 0;
        }

        private Vector4 GetTextureSizeParams(Vector2Int size)
        {
            return new Vector4(size.x, size.y, 1.0f / size.x, 1.0f / size.y);
        }

        private void DoKawaseSample(CommandBuffer cmd, RenderTargetIdentifier sourceid, RenderTargetIdentifier targetid,
                                        Vector2Int sourceSize, Vector2Int targetSize,
                                        float offset, bool downSample, ComputeShader computeShader)
        {
            if (!computeShader) return;
            string kernelName = downSample ? "KawaseDownSample" : "KawaseUpSample";
            int kernelID = computeShader.FindKernel(kernelName);
            computeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
            cmd.SetComputeTextureParam(computeShader, kernelID, "_SourceTexture", sourceid);
            cmd.SetComputeTextureParam(computeShader, kernelID, "_RW_TargetTexture", targetid);
            cmd.SetComputeVectorParam(computeShader, "_SourceSize", GetTextureSizeParams(sourceSize));
            cmd.SetComputeVectorParam(computeShader, "_TargetSize", GetTextureSizeParams(targetSize));
            cmd.SetComputeFloatParam(computeShader, "_Offset", offset);
            cmd.DispatchCompute(computeShader, kernelID,
                                Mathf.CeilToInt((float)targetSize.x / x),
                                Mathf.CeilToInt((float)targetSize.y / y),
                                1);
        }

        private void DoKawaseLinear(CommandBuffer cmd, RenderTargetIdentifier sourceid, RenderTargetIdentifier targetid,
            Vector2Int sourceSize, float offset, ComputeShader computeShader)
        {
            if (!computeShader) return;
            string kernelName = "KawaseLinear";
            int kernelID = computeShader.FindKernel(kernelName);
            computeShader.GetKernelThreadGroupSizes(kernelID, out uint x, out uint y, out uint z);
            cmd.SetComputeTextureParam(computeShader, kernelID, "_SourceTexture", sourceid);
            cmd.SetComputeTextureParam(computeShader, kernelID, "_RW_TargetTexture", targetid);
            cmd.SetComputeVectorParam(computeShader, "_SourceSize", GetTextureSizeParams(sourceSize));
            cmd.SetComputeFloatParam(computeShader, "_Offset", offset);
            cmd.DispatchCompute(computeShader, kernelID,
                                Mathf.CeilToInt((float)sourceSize.x / x),
                                Mathf.CeilToInt((float)sourceSize.y / y),
                                1);
        }

        public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
        {
            CommandBuffer cmd = CommandBufferPool.Get();
            using (new ProfilingScope(cmd, profilingSampler))
            {
                List<int> rtIDs = new List<int>();
                List<Vector2Int> rtSizes = new List<Vector2Int>();

                RenderTextureDescriptor tempDesc = desc;
                string kawaseRT = "_KawaseRT";
                int kawaseRTID = Shader.PropertyToID(kawaseRT);
                cmd.GetTemporaryRT(kawaseRTID, tempDesc);

                rtIDs.Add(kawaseRTID);
                rtSizes.Add(textureSize);

                float downSampleAmount = Mathf.Log(dualKawaseBlur.GetRadius() + 1.0f) / 0.693147181f;
                int downSampleCount = Mathf.FloorToInt(downSampleAmount);
                float offsetRatio = downSampleAmount - (float)downSampleCount;

                Vector2Int lastSize = textureSize;
                int lastID = cameraColorTextureID;
                for (int i = 0; i <= downSampleCount; i++)
                {
                    string rtName = "_KawaseRT" + i.ToString();
                    int rtID = Shader.PropertyToID(rtName);
                    Vector2Int rtSize = new Vector2Int((lastSize.x + 1) / 2, (lastSize.y + 1) / 2);
                    tempDesc.width = rtSize.x;
                    tempDesc.height = rtSize.y;
                    cmd.GetTemporaryRT(rtID, tempDesc);

                    rtIDs.Add(rtID);
                    rtSizes.Add(rtSize);

                    DoKawaseSample(cmd, lastID, rtID, lastSize, rtSize, 1.0f, true, computeShader);
                    lastSize = rtSize;
                    lastID = rtID;
                }

                if(downSampleCount == 0)
                {
                    DoKawaseSample(cmd, rtIDs[1], rtIDs[0], rtSizes[1], rtSizes[0], 1.0f, false, computeShader);
                    DoKawaseLinear(cmd, cameraColorIden, rtIDs[0], rtSizes[0], offsetRatio, computeShader);
                }
                else
                {
                    string intermediateRTName = "_KawaseRT" + (downSampleCount + 1).ToString();
                    int intermediateRTID = Shader.PropertyToID(intermediateRTName);
                    Vector2Int intermediateRTSize = rtSizes[downSampleCount];
                    tempDesc.width = intermediateRTSize.x;
                    tempDesc.height = intermediateRTSize.y;
                    cmd.GetTemporaryRT(intermediateRTID, tempDesc);

                    for (int i = downSampleCount+1; i >= 1; i--)
                    {
                        int sourceID = rtIDs[i];
                        Vector2Int sourceSize = rtSizes[i];
                        int targetID = i == (downSampleCount + 1) ? intermediateRTID : rtIDs[i - 1];
                        Vector2Int targetSize = rtSizes[i - 1];

                        DoKawaseSample(cmd, sourceID, targetID, sourceSize, targetSize, 1.0f, false, computeShader);

                        if (i == (downSampleCount + 1))
                        {
                            DoKawaseLinear(cmd, rtIDs[i - 1], intermediateRTID, targetSize, offsetRatio, computeShader);
                            int tempID = intermediateRTID;
                            intermediateRTID = rtIDs[i - 1];
                            rtIDs[i - 1] = tempID;
                        }
                        cmd.ReleaseTemporaryRT(sourceID);
                    }
                    cmd.ReleaseTemporaryRT(intermediateRTID);
                }

                cmd.Blit(kawaseRTID, cameraColorIden);
                cmd.ReleaseTemporaryRT(kawaseRTID);
            }
            context.ExecuteCommandBuffer(cmd);
            cmd.Clear();
            CommandBufferPool.Release(cmd);
        }
    }
}

DualKawaseBlur.cs

没啥好说的

using System;

namespace UnityEngine.Rendering.Universal
{
    [Serializable, VolumeComponentMenuForRenderPipeline("Post-processing/Dual Kawase Blur", typeof(UniversalRenderPipeline))]
    public class DualKawaseBlur : VolumeComponent, IPostProcessComponent
    {
        public BoolParameter isEnabled = new BoolParameter(false);
        public ClampedFloatParameter maxRadius = new ClampedFloatParameter(32.0f, 0.0f, 255.0f);
        public ClampedFloatParameter intensity = new ClampedFloatParameter(0.0f, 0.0f, 1.0f);

        public float GetRadius()
        {
            return maxRadius.value * intensity.value;
        }

        public bool IsActive()
        {
            return isEnabled.value && intensity.value > 0.0f;
        }

        public bool IsTileCompatible()
        {
            return false;
        }
    }
}

DualKawaseBlurRendererFeature.cs

也没啥好说的。

using System.Collections;
namespace UnityEngine.Rendering.Universal
{
    public class DualKawaseBlurRendererFeature : ScriptableRendererFeature
    {
        [System.Serializable]
        public class DualKawaseBlurSettings
        {
            public RenderPassEvent renderPassEvent = RenderPassEvent.BeforeRenderingPostProcessing;
            public ComputeShader computeShader;
        }

        public DualKawaseBlurSettings settings = new DualKawaseBlurSettings();
        private DualKawaseBlurRenderPass dualKawaseBlurRenderPass;

        public override void Create()
        {
            dualKawaseBlurRenderPass = new DualKawaseBlurRenderPass(settings);
        }

        public override void AddRenderPasses(ScriptableRenderer renderer, ref RenderingData renderingData)
        {
            DualKawaseBlur dualKawaseBlur = VolumeManager.instance.stack.GetComponent<DualKawaseBlur>();
            if (dualKawaseBlur != null && dualKawaseBlur.IsActive())
            {
                dualKawaseBlurRenderPass.Setup(dualKawaseBlur);
                renderer.EnqueuePass(dualKawaseBlurRenderPass);
            }
        }
    }
}

后记

迅速地写完了这篇博客，其实有价值的东西并不是太多。主要是在网上搜索了半天，只有几乎千篇一律的代码和千篇一律的示意图，并没有很详细的解释，因此自己做了一些研究。不得不说Dual Kawase Blur做得还是很聪明的，最后模糊的效果也看不出什么明显的方形的痕迹。

2023年6月3日修订#

写这篇文章的原因#

双Kawase模糊(Dual Kawase Blur)#

量化双Kawase模糊#

具体的操作#

DualKawaseBlurComputeShader.compute#

DualKawaseBlurRenderPass.cs#

DualKawaseBlur.cs#

DualKawaseBlurRendererFeature.cs#

后记#