Screen Space Reflection

While screen space reflection (SSR) is a well-known effect, this article aims to introduce a unique method for calculating screen space reflections – one that I haven’t encountered online before.

Many online tutorials cover screen space reflection already, and most of them follow a similar process: calculate the reflection direction in world space, use a mostly uniform step size to traverse the world space, and for each step, compute the normalized device coordinates (NDC). Then, compare the current depth with the depth value sampled from depth texture. If the ray depth is below the sampled depth, consider it a reflection intersection (hit) and sample the color value at that location. This method can yield visually pleasing reflection effects, but hardly anyone mentions the staggering number of iterations required. We’ll discuss this further shortly.

Additionally, achieving good reflection results for objects at varying distances often requires different step sizes, but few people delve into this consideration. Some slightly improved approaches involve binary search after ray hitting the scene to ensure smoother transitions between reflection colors. Others may prematurely terminate steps (known as early return) or interpolate reflection colors and environment reflection probe based on comparison between NDC coordinate and the [-1, 1] range.

Currently, the most effective screen space ray marching method involves precomputing a Hierarchical ZBuffer. By stepping into and out of different LODs, this approach achieves the same results with fewer iterations. However, Hierarchical ZBuffer is not a feature available in every projects.

The most valuable tutorial one can find online is Screen Space Ray Tracing by Morgan McGuire. He also wrote a paper about his algorithm. In his article, McGuire highlights why stepping in world space can be problematic. After undergoing perspective transformation, the step positions in world space may not vary significantly in screen space, leading to the need for more iterations to achieve desirable reflection effects. Also, McGuire presents an ingenious approach in his article. He calculates the coordinates of the starting and ending points in both clip space and screen space. By linearly interpolating the z coordinate in clip space, the 1/w coordinate in clip space, and the xy coordinates in screen space, he eliminates the matrix computations required during each step. Definitely worth using!

The goal of this article is to get the correct reflection color using as few iterations as possible within a single shader. Random sampling, blurring, and Fresnel effect are not within the scope of this article. We will focus solely on Windows platform DX11 shaders, which allows us to avoid extensive platform-specific code. The Unity version used for this article is Unity 2022.3.21f1. The final shader code will be provided at the end of the article.

Calculation of Reflections

Parameters

The calculation of reflections typically relies on three essential parameters:

  1. Max Distance: This parameter considers reflections from objects within a certain range around the reflection point.
  2. Step Count: Increasing the number of steps results in more accurate reflections but also impacts performance.
  3. Thickness Params: In this article, an object’s default thickness is calculated as depth * _Thickness.y + _Thickness.x. This ensures that when a ray passes through behind an object, it is not considered an intersection.

Comparion of Depth Value

When considering what kind of depth value to compare during the stepping process, several factors come into play. We define the depth value we obtained from stepping as rayDepth and the depth value obtained from sampling as sampleDepth.

By directly sampling the depth texture, we obtain the depth value in normalized device coordinates. Therefore a straightforward approach is to compare these depths in NDC. When rayDepth < sampleDepth, the ray intersects with the scene.

Alternatively, we can compare the actual depth values in view space. This approach allows us to specify a thickness value. If the depth difference exceeds this thickness, we consider the ray passing through behind an object without intersection. Specifically, when rayDepth > sampleDepth && rayDepth < sampleDepth + thickness, the ray intersects with the scene.

One thing worth noting is the sampler used when sampling depth texture. Linear interpolation can mistakenly identify intersections at the edges of two faces with different depths, resulting in unwanted artifacts (small dots) on the screen. If available, using a separate texture to mark object edges can help exclude these intersection points. But in our shader, we will stick to a Point Clamp sampler.

Ray Marching

Here’s the workflow breakdown:

  1. Define k0 and k1 as the reciprocals of the w-components of clip space coordinates for the starting and ending points.
  2. Define q0 and q1 as the xyz-components of clip space coordinates for the starting and ending points.
  3. Define p0 and p1 as the xy-components of normalized device coordinates for the starting and ending points.
  4. Define w as a variable that linearly increases in (0, 1) based on _StepCount.
  5. For each step, update the value of w and use it to linearly interpolate the three sets of components (k, q, and p).
  6. Calculate rayDepth using q.z * k, sample the depth texture at p to obtain sampleDepth.
  7. If rayDepth < sampleDepth, the ray intersects with the scene, exit the loop and return p.
  8. Sample the color texture at p to obtain the reflection color.

It looks like this (32 steps): Screen Space Reflection Naive

Quite poor! The most noticeable issue is the stretching effect. There are primarily two reasons for this: First, we did not use thickness to determine whether the ray passes through behind an object, resulting in significant stretching below suspended objects. Second, we did not restrict positions outside the screen area, causing us to sample depth values from coordinates beyond the screen and return depth values at clampped positions.

Thickness Test

To address the thickness issue mentioned earlier, we introduce a method for determining whether the stepping position is behind an object. This method relies on the linear depths from the camera: linearRayDepth and linearSampleDepth.

As previously discussed, we use linearSampleDepth * _Thickness.y + _Thickness.x as the thickness of an object in the scene. To determine if the ray passes through behind an object, we compare (linearRayDepth - linearSampleDepth - _Thickness.x) / linearSampleDepth with _Thickness.y. If the former is greater than the latter, it indicates that the ray passes through behind an object.

    float getThicknessDiff(float diff, float linearSampleDepth, float2 thicknessParams)
    {
        return (diff - thicknessParams.x) / linearSampleDepth;
    }

The workflow now becomes:

  1. If rayDepth < sampleDepth and thicknessDiff < _Thickness.y, the ray intersects with the scene, exit the loop and return p.

It looks like this (32 steps): Screen Space Reflection Thickness Test

Frustum Clipping

For point p1 that falls outside the screen space, two issues arise: First, sampling the depth texture beyond the screen range yields incorrect depth values. Second, it reduces the effective sampling count. To address this, we can restrict p1 within the screen space. Here’s a method for constraining the stepping endpoint within the view frustum. We define nf as the near and far clipping plane depths (positive values), define s as the slope of the view frustum in the horizontal and vertical directions (positive values). Numerically, s is given by float2(aspect * tan(fovy * 0.5f), tan(fovy * 0.5f)). Note that for ease of calculation, the z-components of from and to are positive.

#define INFINITY 1e10

float3 frustumClip(float3 from, float3 to, float2 nf, float2 s)
{
    float3 dir = to - from;
    float3 signDir = sign(dir);

    float nfSlab = signDir.z * (nf.y - nf.x) * 0.5f + (nf.y + nf.x) * 0.5f;
    float lenZ = (nfSlab - from.z) / dir.z;
    if (dir.z == 0.0f) lenZ = INFINITY;

    float2 ss = sign(dir.xy - s * dir.z) * s;
    float2 denom = ss * dir.z - dir.xy;
    float2 lenXY = (from.xy - ss * from.z) / denom;
    if (lenXY.x < 0.0f || denom.x == 0.0f) lenXY.x = INFINITY;
    if (lenXY.y < 0.0f || denom.y == 0.0f) lenXY.y = INFINITY;

    float len = min(min(1.0f, lenZ), min(lenXY.x, lenXY.y));
    float3 clippedVS = from + dir * len;
    return clippedVS;
}

Actually I have wrote a shadertoy to demonstrate frustum clipping in 2D, interactable with mouse:

Frustum Clip 2D

The workflow now becomes:

  1. Frustum clip the stepping endpoint to clippedPosVS, and transform it to clip space position endCS.

It looks like this (32 steps): Screen Space Reflection Frustum Clip

The scene is starting to exhibit some reflection, although there’s still room for improvement. The view frustum clipping has indeed reduced the number of pixels between steps, filling in some of the gaps. However, the reflected colors appear distorted.

In our previous step, although p is ensured to be at the intersected object, there is still some distance from the actual intersection point. To reduce this error, we can use binary search. Here’s how it works: We maintain two variables during stepping, w1 and w2, representing the w values in last two steps (w1 > w2). During each binary search iteration, we calculate w = 0.5f * (w1 + w2), if an intersection is detected, update w1 = w, otherwise, update w2 = w and proceed to the next iteration.

The workflow now becomes:

  1. Frustum clip the stepping endpoint to clippedPosVS, and transform it to clip space position endCS.
  2. Define k0 and k1 as the reciprocals of the w-components of clip space coordinates for the starting and ending points.
  3. Define q0 and q1 as the xyz-components of clip space coordinates for the starting and ending points.
  4. Define p0 and p1 as the xy-components of normalized device coordinates for the starting and ending points.
  5. Define w1 as a variable that linearly increases in (0, 1) based on _StepCount, initialize w1 and w2 with 0.
  6. For each step, w2 = w1, update the value of w1 and use it to linearly interpolate the three sets of components (k, q, and p).
  7. Calculate rayDepth using q.z * k, sample the depth texture at p to obtain sampleDepth.
  8. If rayDepth < sampleDepth and thicknessDiff < _Thickness.y, the ray intersects with the scene, exit the loop.
  9. Let w be the average of w1 and w2. Repeat 5, 6, and 7 to check whether an intersection occurs until the binary search loop ends. We update either w1 or w2 in each step depending on the result.
  10. Sample the color texture at p to obtain the reflection color.

It looks like this (32 steps, 5 binary searches): Screen Space Reflection Binary Search

The reflection effect now appears less distorted (particularly noticeable in the bottom left corner). However, there are still gaps between color segments due to our thickness test, which excludes potential intersections.

Potential Intersections

To compute potential intersections, let’s revisit the workflow where we do thickness test. When the ray passes through behind an object, if it is above the scene during last step, we can calculate the difference (thicknessDiff) between rayDepth and sampleDepth. If this value is smaller than the minimum difference (minThicknessDiff), we consider it a potential intersection. We update minThicknessDiff and record the current w1 and w2 for subsequent binary search.

During binary search, if an actual intersection occurs, we follow the original code. If a potential intersection occurs, we also need to track thicknessDiff during binary search. We find the smallest thicknessDiff less than _Thickness.y, use current w to interpolate between p0 and p1 to obtain p, and finally use p to sample the color texture.

The workflow now becomes:

  1. Frustum clip the stepping endpoint to clippedPosVS, and transform it to clip space position endCS.
  2. Define k0 and k1 as the reciprocals of the w-components of clip space coordinates for the starting and ending points.
  3. Define q0 and q1 as the xyz-components of clip space coordinates for the starting and ending points.
  4. Define p0 and p1 as the xy-components of normalized device coordinates for the starting and ending points.
  5. Define w1 as a variable that linearly increases in (0, 1) based on _StepCount, initialize w1 and w2 with 0.
  6. For each step, w2 = w1, update the value of w1 and use it to linearly interpolate the three sets of components (k, q, and p).
  7. Calculate rayDepth using q.z * k, sample the depth texture at p to obtain sampleDepth.
  8. If rayDepth < sampleDepth and thicknessDiff < _Thickness.y, the ray intersects with the scene, exit the loop.
  9. Otherwise, if rayZ < sampleZ, thicknessDiff > _Thickness.y, and the previous ray was in front of the scene, compare thicknessDiff with the minimum value. If smaller, update the minimum value, record the current w1 and w2, and mark this as a potential intersection, continue looping.
  10. If an actual intersection occurs, let w be the average of w1 and w2. Repeat 5, 6, and 7 to check whether an intersection occurs until the binary search loop ends. We update either w1 or w2 in each step depending on the result.
  11. If a potential intersection occurs, repeat 5, 6, and 7 to check whether an intersection occurs, and use the smallest thicknessDiff and w to update p.
  12. Sample the color texture at p to obtain the reflection color.

It looks like this (32 steps, 5 binary searches): Screen Space Reflection Potential Hit And here is the result using 64 steps and 5 binary searches: Screen Space Reflection Potential Hit

ScreenSpaceReflection.shader

/*
// Copyright (c) 2024 zznewclear@gmail.com
// 
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
// 
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
// 
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
*/

Shader "zznewclear13/SSRShader"
{
    Properties
    {
        [Toggle(USE_POTENTIAL_HIT)] _UsePotentialHit ("Use Potential Hit", Float) = 1.0
        [Toggle(USE_FRUSTUM_CLIP)] _UseFrustumClip ("Use Frustum Clip", Float) = 1.0
        [Toggle(USE_BINARY_SEARCH)] _UseBinarySearch ("Use Binary Search", Float) = 1.0
        [Toggle(USE_THICKNESS)] _UseThickness ("Use Thickness", Float) = 1.0
        
        _MaxDistance ("Max Distance", Range(0.1, 100.0)) = 15.0
        [int] _StepCount ("Step Count", Float) = 32
        _ThicknessParams ("Thickness Params", Vector) = (0.1, 0.02, 0.0, 0.0)
    }

    HLSLINCLUDE
    #include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
    #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Lighting.hlsl"
    
    #pragma shader_feature _ USE_POTENTIAL_HIT
    #pragma shader_feature _ USE_FRUSTUM_CLIP
    #pragma shader_feature _ USE_BINARY_SEARCH
    #pragma shader_feature _ USE_THICKNESS

    #define INFINITY 1e10
    #define DEPTH_SAMPLER sampler_PointClamp

    Texture2D _CameraOpaqueTexture;
    Texture2D _CameraDepthTexture;
    CBUFFER_START(UnityPerMaterial)
    float _MaxDistance;
    int _StepCount;
    float2 _ThicknessParams;
    CBUFFER_END

    struct Attributes
    {
        float4 positionOS   : POSITION;
        float3 normalOS     : NORMAL;
        float2 texcoord     : TEXCOORD0;
    };

    struct Varyings
    {
        float4 positionCS   : SV_POSITION;
        float3 positionWS   : TEXCOORD0;
        float3 normalWS     : TEXCOORD1;
        float2 uv           : TEXCOORD2;
        float3 viewWS       : TEXCOORD3;
    };

    Varyings vert(Attributes input)
    {
        Varyings output = (Varyings)0;
        VertexPositionInputs vpi = GetVertexPositionInputs(input.positionOS.xyz);
        VertexNormalInputs vni = GetVertexNormalInputs(input.normalOS);

        output.positionCS = vpi.positionCS;
        output.positionWS = vpi.positionWS;
        output.normalWS = vni.normalWS;
        output.uv = input.texcoord;
        output.viewWS = GetCameraPositionWS() - vpi.positionWS;
        return output;
    }

    float3 frustumClip(float3 from, float3 to, float2 nf, float2 s)
    {
        float3 dir = to - from;
        float3 signDir = sign(dir);

        float nfSlab = signDir.z * (nf.y - nf.x) * 0.5f + (nf.y + nf.x) * 0.5f;
        float lenZ = (nfSlab - from.z) / dir.z;
        if (dir.z == 0.0f) lenZ = INFINITY;

        float2 ss = sign(dir.xy - s * dir.z) * s;
        float2 denom = ss * dir.z - dir.xy;
        float2 lenXY = (from.xy - ss * from.z) / denom;
        if (lenXY.x < 0.0f || denom.x == 0.0f) lenXY.x = INFINITY;
        if (lenXY.y < 0.0f || denom.y == 0.0f) lenXY.y = INFINITY;

        float len = min(min(1.0f, lenZ), min(lenXY.x, lenXY.y));
        float3 clippedVS = from + dir * len;
        return clippedVS;
    }

    float getThicknessDiff(float diff, float linearSampleDepth, float2 thicknessParams)
    {
        return (diff - thicknessParams.x) / linearSampleDepth;
    }

    float4 frag(Varyings input) : SV_TARGET
    {
        float3 positionWS = input.positionWS;
        float3 normalWS = normalize(input.normalWS);
        float3 viewWS = normalize(input.viewWS);
        float3 reflWS = reflect(-viewWS, normalWS);
        float3 env = GlossyEnvironmentReflection(reflWS, 0.0f, 1.0f);
        float3 color = env;

        float3 originWS = positionWS;
        float3 endWS = positionWS + reflWS * _MaxDistance;

#if defined(USE_FRUSTUM_CLIP)
        float3 originVS = mul(UNITY_MATRIX_V, float4(originWS, 1.0f)).xyz;
        float3 endVS = mul(UNITY_MATRIX_V, float4(endWS, 1.0f)).xyz;
        float3 flipZ = float3(1.0f, 1.0f, -1.0f);
        float3 clippedVS = frustumClip(originVS * flipZ, endVS * flipZ, _ProjectionParams.yz, float2(1.0f, -1.0f) / UNITY_MATRIX_P._m00_m11);
        clippedVS *= flipZ;
        float4 originCS = mul(UNITY_MATRIX_VP, float4(originWS, 1.0f));
        float4 endCS = mul(UNITY_MATRIX_P, float4(clippedVS, 1.0f));
#else
        float4 originCS = mul(UNITY_MATRIX_VP, float4(originWS, 1.0f));
        float4 endCS = mul(UNITY_MATRIX_VP, float4(endWS, 1.0f));
#endif

        float k0 = 1.0f / originCS.w;
        float k1 = 1.0f / endCS.w;
        float3 q0 = originCS.xyz;
        float3 q1 = endCS.xyz;
        float2 p0 = originCS.xy * float2(1.0f, -1.0f) * k0 * 0.5f + 0.5f;
        float2 p1 = endCS.xy * float2(1.0f, -1.0f) * k1 * 0.5f + 0.5f;

#if defined(USE_POTENTIAL_HIT)
        float w1 = 0.0f;
        float w2 = 0.0f;
        bool hit = false;
        bool lastHit = false;
        bool potentialHit = false;
        float2 potentialW12 = float2(0.0f, 0.0f);
        float minPotentialHitPos = INFINITY;
        [unroll(64)]
        for (int i=0; i<_StepCount; ++i)
        {
            w2 = w1;
            w1 += 1.0f / float(_StepCount);

            float3 q = lerp(q0, q1, w1);
            float2 p = lerp(p0, p1, w1);
            float k = lerp(k0, k1, w1);
            float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
            float linearSampleDepth = LinearEyeDepth(sampleDepth, _ZBufferParams);
            float linearRayDepth = LinearEyeDepth(q.z * k, _ZBufferParams);

            float hitDiff = linearRayDepth - linearSampleDepth;
            float thicknessDiff = getThicknessDiff(hitDiff, linearSampleDepth, _ThicknessParams);
            if (hitDiff > 0.0f)
            {
                if (thicknessDiff < _ThicknessParams.y)
                {
                    hit = true;
                    break;
                }
                else if(!lastHit)
                {
                    potentialHit = true;
                    if (minPotentialHitPos > thicknessDiff)
                    {
                        minPotentialHitPos = thicknessDiff;
                        potentialW12 = float2(w1, w2);
                    }
                }
            }
            lastHit = hitDiff > 0.0f;
        }
#else
        float w1 = 0.0f;
        float w2 = 0.0f;
        bool hit = false;
        [unroll(64)]
        for (int i=0; i<_StepCount; ++i)
        {
            w2 = w1;
            w1 += 1.0f / float(_StepCount);

            float3 q = lerp(q0, q1, w1);
            float2 p = lerp(p0, p1, w1);
            float k = lerp(k0, k1, w1);
            float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
#if defined(USE_THICKNESS)
            float linearSampleDepth = LinearEyeDepth(sampleDepth, _ZBufferParams);
            float linearRayDepth = LinearEyeDepth(q.z * k, _ZBufferParams);

            float hitDiff = linearRayDepth - linearSampleDepth;
            float thicknessDiff = getThicknessDiff(hitDiff, linearSampleDepth, _ThicknessParams);
            if (hitDiff > 0.0f && thicknessDiff < _ThicknessParams.y)
            {
                hit = true;
                break;
            }       
#else
            if (q.z * k < sampleDepth)
            {
                hit = true;
                break;
            }
#endif
        }
#endif

#if defined(USE_POTENTIAL_HIT)
        if (hit || potentialHit)
        {
            if (!hit)
            {
                w1 = potentialW12.x;
                w2 = potentialW12.y;
            }

            bool realHit = false;
            float2 hitPos;
            float minThicknessDiff = _ThicknessParams.y;
            [unroll(5)]
            for (int i=0; i<5; ++i)
            {
                float w = 0.5f * (w1 + w2);
                float3 q = lerp(q0, q1, w);
                float2 p = lerp(p0, p1, w);
                float k = lerp(k0, k1, w);
                float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
                float linearSampleDepth = LinearEyeDepth(sampleDepth, _ZBufferParams);
                float linearRayDepth = LinearEyeDepth(q.z * k, _ZBufferParams);
                float hitDiff = linearRayDepth - linearSampleDepth;

                if (hitDiff > 0.0f)
                {
                    w1 = w;
                    if (hit) hitPos = p;
                }
                else
                {
                    w2 = w;
                }

                float thicknessDiff = getThicknessDiff(hitDiff, linearSampleDepth, _ThicknessParams);
                float absThicknessDiff = abs(thicknessDiff);
                if (!hit && absThicknessDiff < minThicknessDiff) 
                {
                    realHit = true;
                    minThicknessDiff = thicknessDiff;
                    hitPos = p;
                }
            }

            if (hit || realHit) color = _CameraOpaqueTexture.Sample(sampler_LinearClamp, hitPos).rgb * 0.3f;
        }
#elif defined(USE_BINARY_SEARCH)
        if (hit)
        {
            float2 hitPos;
            [unroll(5)]
            for (int i=0; i<5; ++i)
            {
                float w = 0.5f * (w1 + w2);
                float3 q = lerp(q0, q1, w);
                float2 p = lerp(p0, p1, w);
                float k = lerp(k0, k1, w);

                float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
                if (q.z * k < sampleDepth)
                {
                    w1 = w;
                    hitPos = p;
                }
                else
                {
                    w2 = w;
                }
            }
            color = _CameraOpaqueTexture.Sample(sampler_LinearClamp, hitPos).rgb * 0.3f;
        }
#else
        if (hit)
        {
            float2 hitPos = lerp(p0, p1, w1);
            color = _CameraOpaqueTexture.Sample(sampler_LinearClamp, hitPos).rgb * 0.3f;
        }
#endif

        return float4(color, 1.0f);
    }

    ENDHLSL

    SubShader
    {
        Tags { "RenderType"="Transparent" "Queue"="Transparent" }
        LOD 100

        Pass
        {
            HLSLPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            ENDHLSL
        }
    }
}

Future Optimization

Currently, there is one aspect worth optimizing: controlling the overall step count based on the pixel distance between p0 and p1. We certainly don’t want to step 64 times for just 10 pixels. However, this is a relatively straightforward task, and I’ll leave it to someone with time to spare. As for random sampling, blurring, and Fresnel effects, let’s consider those when we really need them.

Postscript

This article was translated by Microsoft’s Copilot and I made a few adjustments. What an era we live in!