Screen Space Reflection
While screen space reflection (SSR) is a well-known effect, this article aims to introduce a unique method for calculating screen space reflections – one that I haven’t encountered online before.
Many online tutorials cover screen space reflection already, and most of them follow a similar process: calculate the reflection direction in world space, use a mostly uniform step size to traverse the world space, and for each step, compute the normalized device coordinates (NDC). Then, compare the current depth with the depth value sampled from depth texture. If the ray depth is below the sampled depth, consider it a reflection intersection (hit) and sample the color value at that location. This method can yield visually pleasing reflection effects, but hardly anyone mentions the staggering number of iterations required. We’ll discuss this further shortly.
Additionally, achieving good reflection results for objects at varying distances often requires different step sizes, but few people delve into this consideration. Some slightly improved approaches involve binary search after ray hitting the scene to ensure smoother transitions between reflection colors. Others may prematurely terminate steps (known as early return) or interpolate reflection colors and environment reflection probe based on comparison between NDC coordinate and the [-1, 1] range.
Currently, the most effective screen space ray marching method involves precomputing a Hierarchical ZBuffer. By stepping into and out of different LODs, this approach achieves the same results with fewer iterations. However, Hierarchical ZBuffer is not a feature available in every projects.
The most valuable tutorial one can find online is Screen Space Ray Tracing by Morgan McGuire. He also wrote a paper about his algorithm. In his article, McGuire highlights why stepping in world space can be problematic. After undergoing perspective transformation, the step positions in world space may not vary significantly in screen space, leading to the need for more iterations to achieve desirable reflection effects. Also, McGuire presents an ingenious approach in his article. He calculates the coordinates of the starting and ending points in both clip space and screen space. By linearly interpolating the z coordinate in clip space, the 1/w coordinate in clip space, and the xy coordinates in screen space, he eliminates the matrix computations required during each step. Definitely worth using!
The goal of this article is to get the correct reflection color using as few iterations as possible within a single shader. Random sampling, blurring, and Fresnel effect are not within the scope of this article. We will focus solely on Windows platform DX11 shaders, which allows us to avoid extensive platform-specific code. The Unity version used for this article is Unity 2022.3.21f1. The final shader code will be provided at the end of the article.
Calculation of Reflections
Parameters
The calculation of reflections typically relies on three essential parameters:
- Max Distance: This parameter considers reflections from objects within a certain range around the reflection point.
- Step Count: Increasing the number of steps results in more accurate reflections but also impacts performance.
- Thickness Params: In this article, an object’s default thickness is calculated as
depth * _Thickness.y + _Thickness.x
. This ensures that when a ray passes through behind an object, it is not considered an intersection.
Comparion of Depth Value
When considering what kind of depth value to compare during the stepping process, several factors come into play. We define the depth value we obtained from stepping as rayDepth
and the depth value obtained from sampling as sampleDepth
.
By directly sampling the depth texture, we obtain the depth value in normalized device coordinates. Therefore a straightforward approach is to compare these depths in NDC. When rayDepth < sampleDepth
, the ray intersects with the scene.
Alternatively, we can compare the actual depth values in view space. This approach allows us to specify a thickness value. If the depth difference exceeds this thickness, we consider the ray passing through behind an object without intersection. Specifically, when rayDepth > sampleDepth && rayDepth < sampleDepth + thickness
, the ray intersects with the scene.
One thing worth noting is the sampler used when sampling depth texture. Linear interpolation can mistakenly identify intersections at the edges of two faces with different depths, resulting in unwanted artifacts (small dots) on the screen. If available, using a separate texture to mark object edges can help exclude these intersection points. But in our shader, we will stick to a Point Clamp
sampler.
Ray Marching
Here’s the workflow breakdown:
- Define
k0
andk1
as the reciprocals of the w-components of clip space coordinates for the starting and ending points.- Define
q0
andq1
as the xyz-components of clip space coordinates for the starting and ending points.- Define
p0
andp1
as the xy-components of normalized device coordinates for the starting and ending points.- Define
w
as a variable that linearly increases in (0, 1) based on_StepCount
.- For each step, update the value of
w
and use it to linearly interpolate the three sets of components (k
,q
, andp
).- Calculate
rayDepth
usingq.z * k
, sample the depth texture atp
to obtainsampleDepth
.- If
rayDepth < sampleDepth
, the ray intersects with the scene, exit the loop and returnp
.- Sample the color texture at
p
to obtain the reflection color.
It looks like this (32 steps):
Quite poor! The most noticeable issue is the stretching effect. There are primarily two reasons for this: First, we did not use thickness to determine whether the ray passes through behind an object, resulting in significant stretching below suspended objects. Second, we did not restrict positions outside the screen area, causing us to sample depth values from coordinates beyond the screen and return depth values at clampped positions.
Thickness Test
To address the thickness issue mentioned earlier, we introduce a method for determining whether the stepping position is behind an object. This method relies on the linear depths from the camera: linearRayDepth
and linearSampleDepth
.
As previously discussed, we use linearSampleDepth * _Thickness.y + _Thickness.x
as the thickness of an object in the scene. To determine if the ray passes through behind an object, we compare (linearRayDepth - linearSampleDepth - _Thickness.x) / linearSampleDepth
with _Thickness.y
. If the former is greater than the latter, it indicates that the ray passes through behind an object.
float getThicknessDiff(float diff, float linearSampleDepth, float2 thicknessParams)
{
return (diff - thicknessParams.x) / linearSampleDepth;
}
The workflow now becomes:
- If
rayDepth < sampleDepth
andthicknessDiff < _Thickness.y
, the ray intersects with the scene, exit the loop and returnp
.
It looks like this (32 steps):
Frustum Clipping
For point p1
that falls outside the screen space, two issues arise: First, sampling the depth texture beyond the screen range yields incorrect depth values. Second, it reduces the effective sampling count. To address this, we can restrict p1
within the screen space. Here’s a method for constraining the stepping endpoint within the view frustum. We define nf
as the near and far clipping plane depths (positive values), define s
as the slope of the view frustum in the horizontal and vertical directions (positive values). Numerically, s
is given by float2(aspect * tan(fovy * 0.5f), tan(fovy * 0.5f))
. Note that for ease of calculation, the z-components of from
and to
are positive.
#define INFINITY 1e10
float3 frustumClip(float3 from, float3 to, float2 nf, float2 s)
{
float3 dir = to - from;
float3 signDir = sign(dir);
float nfSlab = signDir.z * (nf.y - nf.x) * 0.5f + (nf.y + nf.x) * 0.5f;
float lenZ = (nfSlab - from.z) / dir.z;
if (dir.z == 0.0f) lenZ = INFINITY;
float2 ss = sign(dir.xy - s * dir.z) * s;
float2 denom = ss * dir.z - dir.xy;
float2 lenXY = (from.xy - ss * from.z) / denom;
if (lenXY.x < 0.0f || denom.x == 0.0f) lenXY.x = INFINITY;
if (lenXY.y < 0.0f || denom.y == 0.0f) lenXY.y = INFINITY;
float len = min(min(1.0f, lenZ), min(lenXY.x, lenXY.y));
float3 clippedVS = from + dir * len;
return clippedVS;
}
Actually I have wrote a shadertoy to demonstrate frustum clipping in 2D, interactable with mouse:
The workflow now becomes:
- Frustum clip the stepping endpoint to
clippedPosVS
, and transform it to clip space positionendCS
.
It looks like this (32 steps):
The scene is starting to exhibit some reflection, although there’s still room for improvement. The view frustum clipping has indeed reduced the number of pixels between steps, filling in some of the gaps. However, the reflected colors appear distorted.
Binary Search
In our previous step, although p
is ensured to be at the intersected object, there is still some distance from the actual intersection point. To reduce this error, we can use binary search. Here’s how it works: We maintain two variables during stepping, w1
and w2
, representing the w
values in last two steps (w1 > w2
). During each binary search iteration, we calculate w = 0.5f * (w1 + w2)
, if an intersection is detected, update w1 = w
, otherwise, update w2 = w
and proceed to the next iteration.
The workflow now becomes:
- Frustum clip the stepping endpoint to
clippedPosVS
, and transform it to clip space positionendCS
.- Define
k0
andk1
as the reciprocals of the w-components of clip space coordinates for the starting and ending points.- Define
q0
andq1
as the xyz-components of clip space coordinates for the starting and ending points.- Define
p0
andp1
as the xy-components of normalized device coordinates for the starting and ending points.- Define
w1
as a variable that linearly increases in (0, 1) based on_StepCount
, initializew1
andw2
with 0.- For each step,
w2 = w1
, update the value ofw1
and use it to linearly interpolate the three sets of components (k
,q
, andp
).- Calculate
rayDepth
usingq.z * k
, sample the depth texture atp
to obtainsampleDepth
.- If
rayDepth < sampleDepth
andthicknessDiff < _Thickness.y
, the ray intersects with the scene, exit the loop.- Let
w
be the average ofw1
andw2
. Repeat 5, 6, and 7 to check whether an intersection occurs until the binary search loop ends. We update eitherw1
orw2
in each step depending on the result.- Sample the color texture at
p
to obtain the reflection color.
It looks like this (32 steps, 5 binary searches):
The reflection effect now appears less distorted (particularly noticeable in the bottom left corner). However, there are still gaps between color segments due to our thickness test, which excludes potential intersections.
Potential Intersections
To compute potential intersections, let’s revisit the workflow where we do thickness test. When the ray passes through behind an object, if it is above the scene during last step, we can calculate the difference (thicknessDiff
) between rayDepth
and sampleDepth
. If this value is smaller than the minimum difference (minThicknessDiff
), we consider it a potential intersection. We update minThicknessDiff
and record the current w1
and w2
for subsequent binary search.
During binary search, if an actual intersection occurs, we follow the original code. If a potential intersection occurs, we also need to track thicknessDiff
during binary search. We find the smallest thicknessDiff
less than _Thickness.y
, use current w
to interpolate between p0
and p1
to obtain p
, and finally use p
to sample the color texture.
The workflow now becomes:
- Frustum clip the stepping endpoint to
clippedPosVS
, and transform it to clip space positionendCS
.- Define
k0
andk1
as the reciprocals of the w-components of clip space coordinates for the starting and ending points.- Define
q0
andq1
as the xyz-components of clip space coordinates for the starting and ending points.- Define
p0
andp1
as the xy-components of normalized device coordinates for the starting and ending points.- Define
w1
as a variable that linearly increases in (0, 1) based on_StepCount
, initializew1
andw2
with 0.- For each step,
w2 = w1
, update the value ofw1
and use it to linearly interpolate the three sets of components (k
,q
, andp
).- Calculate
rayDepth
usingq.z * k
, sample the depth texture atp
to obtainsampleDepth
.- If
rayDepth < sampleDepth
andthicknessDiff < _Thickness.y
, the ray intersects with the scene, exit the loop.- Otherwise, if
rayZ < sampleZ
,thicknessDiff > _Thickness.y
, and the previous ray was in front of the scene, comparethicknessDiff
with the minimum value. If smaller, update the minimum value, record the currentw1
andw2
, and mark this as a potential intersection, continue looping.- If an actual intersection occurs, let
w
be the average ofw1
andw2
. Repeat 5, 6, and 7 to check whether an intersection occurs until the binary search loop ends. We update eitherw1
orw2
in each step depending on the result.- If a potential intersection occurs, repeat 5, 6, and 7 to check whether an intersection occurs, and use the smallest
thicknessDiff
andw
to updatep
.- Sample the color texture at
p
to obtain the reflection color.
It looks like this (32 steps, 5 binary searches): And here is the result using 64 steps and 5 binary searches:
ScreenSpaceReflection.shader
/*
// Copyright (c) 2024 zznewclear@gmail.com
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
*/
Shader "zznewclear13/SSRShader"
{
Properties
{
[Toggle(USE_POTENTIAL_HIT)] _UsePotentialHit ("Use Potential Hit", Float) = 1.0
[Toggle(USE_FRUSTUM_CLIP)] _UseFrustumClip ("Use Frustum Clip", Float) = 1.0
[Toggle(USE_BINARY_SEARCH)] _UseBinarySearch ("Use Binary Search", Float) = 1.0
[Toggle(USE_THICKNESS)] _UseThickness ("Use Thickness", Float) = 1.0
_MaxDistance ("Max Distance", Range(0.1, 100.0)) = 15.0
[int] _StepCount ("Step Count", Float) = 32
_ThicknessParams ("Thickness Params", Vector) = (0.1, 0.02, 0.0, 0.0)
}
HLSLINCLUDE
#include "Packages/com.unity.render-pipelines.core/ShaderLibrary/Common.hlsl"
#include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
#include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Lighting.hlsl"
#pragma shader_feature _ USE_POTENTIAL_HIT
#pragma shader_feature _ USE_FRUSTUM_CLIP
#pragma shader_feature _ USE_BINARY_SEARCH
#pragma shader_feature _ USE_THICKNESS
#define INFINITY 1e10
#define DEPTH_SAMPLER sampler_PointClamp
Texture2D _CameraOpaqueTexture;
Texture2D _CameraDepthTexture;
CBUFFER_START(UnityPerMaterial)
float _MaxDistance;
int _StepCount;
float2 _ThicknessParams;
CBUFFER_END
struct Attributes
{
float4 positionOS : POSITION;
float3 normalOS : NORMAL;
float2 texcoord : TEXCOORD0;
};
struct Varyings
{
float4 positionCS : SV_POSITION;
float3 positionWS : TEXCOORD0;
float3 normalWS : TEXCOORD1;
float2 uv : TEXCOORD2;
float3 viewWS : TEXCOORD3;
};
Varyings vert(Attributes input)
{
Varyings output = (Varyings)0;
VertexPositionInputs vpi = GetVertexPositionInputs(input.positionOS.xyz);
VertexNormalInputs vni = GetVertexNormalInputs(input.normalOS);
output.positionCS = vpi.positionCS;
output.positionWS = vpi.positionWS;
output.normalWS = vni.normalWS;
output.uv = input.texcoord;
output.viewWS = GetCameraPositionWS() - vpi.positionWS;
return output;
}
float3 frustumClip(float3 from, float3 to, float2 nf, float2 s)
{
float3 dir = to - from;
float3 signDir = sign(dir);
float nfSlab = signDir.z * (nf.y - nf.x) * 0.5f + (nf.y + nf.x) * 0.5f;
float lenZ = (nfSlab - from.z) / dir.z;
if (dir.z == 0.0f) lenZ = INFINITY;
float2 ss = sign(dir.xy - s * dir.z) * s;
float2 denom = ss * dir.z - dir.xy;
float2 lenXY = (from.xy - ss * from.z) / denom;
if (lenXY.x < 0.0f || denom.x == 0.0f) lenXY.x = INFINITY;
if (lenXY.y < 0.0f || denom.y == 0.0f) lenXY.y = INFINITY;
float len = min(min(1.0f, lenZ), min(lenXY.x, lenXY.y));
float3 clippedVS = from + dir * len;
return clippedVS;
}
float getThicknessDiff(float diff, float linearSampleDepth, float2 thicknessParams)
{
return (diff - thicknessParams.x) / linearSampleDepth;
}
float4 frag(Varyings input) : SV_TARGET
{
float3 positionWS = input.positionWS;
float3 normalWS = normalize(input.normalWS);
float3 viewWS = normalize(input.viewWS);
float3 reflWS = reflect(-viewWS, normalWS);
float3 env = GlossyEnvironmentReflection(reflWS, 0.0f, 1.0f);
float3 color = env;
float3 originWS = positionWS;
float3 endWS = positionWS + reflWS * _MaxDistance;
#if defined(USE_FRUSTUM_CLIP)
float3 originVS = mul(UNITY_MATRIX_V, float4(originWS, 1.0f)).xyz;
float3 endVS = mul(UNITY_MATRIX_V, float4(endWS, 1.0f)).xyz;
float3 flipZ = float3(1.0f, 1.0f, -1.0f);
float3 clippedVS = frustumClip(originVS * flipZ, endVS * flipZ, _ProjectionParams.yz, float2(1.0f, -1.0f) / UNITY_MATRIX_P._m00_m11);
clippedVS *= flipZ;
float4 originCS = mul(UNITY_MATRIX_VP, float4(originWS, 1.0f));
float4 endCS = mul(UNITY_MATRIX_P, float4(clippedVS, 1.0f));
#else
float4 originCS = mul(UNITY_MATRIX_VP, float4(originWS, 1.0f));
float4 endCS = mul(UNITY_MATRIX_VP, float4(endWS, 1.0f));
#endif
float k0 = 1.0f / originCS.w;
float k1 = 1.0f / endCS.w;
float3 q0 = originCS.xyz;
float3 q1 = endCS.xyz;
float2 p0 = originCS.xy * float2(1.0f, -1.0f) * k0 * 0.5f + 0.5f;
float2 p1 = endCS.xy * float2(1.0f, -1.0f) * k1 * 0.5f + 0.5f;
#if defined(USE_POTENTIAL_HIT)
float w1 = 0.0f;
float w2 = 0.0f;
bool hit = false;
bool lastHit = false;
bool potentialHit = false;
float2 potentialW12 = float2(0.0f, 0.0f);
float minPotentialHitPos = INFINITY;
[unroll(64)]
for (int i=0; i<_StepCount; ++i)
{
w2 = w1;
w1 += 1.0f / float(_StepCount);
float3 q = lerp(q0, q1, w1);
float2 p = lerp(p0, p1, w1);
float k = lerp(k0, k1, w1);
float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
float linearSampleDepth = LinearEyeDepth(sampleDepth, _ZBufferParams);
float linearRayDepth = LinearEyeDepth(q.z * k, _ZBufferParams);
float hitDiff = linearRayDepth - linearSampleDepth;
float thicknessDiff = getThicknessDiff(hitDiff, linearSampleDepth, _ThicknessParams);
if (hitDiff > 0.0f)
{
if (thicknessDiff < _ThicknessParams.y)
{
hit = true;
break;
}
else if(!lastHit)
{
potentialHit = true;
if (minPotentialHitPos > thicknessDiff)
{
minPotentialHitPos = thicknessDiff;
potentialW12 = float2(w1, w2);
}
}
}
lastHit = hitDiff > 0.0f;
}
#else
float w1 = 0.0f;
float w2 = 0.0f;
bool hit = false;
[unroll(64)]
for (int i=0; i<_StepCount; ++i)
{
w2 = w1;
w1 += 1.0f / float(_StepCount);
float3 q = lerp(q0, q1, w1);
float2 p = lerp(p0, p1, w1);
float k = lerp(k0, k1, w1);
float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
#if defined(USE_THICKNESS)
float linearSampleDepth = LinearEyeDepth(sampleDepth, _ZBufferParams);
float linearRayDepth = LinearEyeDepth(q.z * k, _ZBufferParams);
float hitDiff = linearRayDepth - linearSampleDepth;
float thicknessDiff = getThicknessDiff(hitDiff, linearSampleDepth, _ThicknessParams);
if (hitDiff > 0.0f && thicknessDiff < _ThicknessParams.y)
{
hit = true;
break;
}
#else
if (q.z * k < sampleDepth)
{
hit = true;
break;
}
#endif
}
#endif
#if defined(USE_POTENTIAL_HIT)
if (hit || potentialHit)
{
if (!hit)
{
w1 = potentialW12.x;
w2 = potentialW12.y;
}
bool realHit = false;
float2 hitPos;
float minThicknessDiff = _ThicknessParams.y;
[unroll(5)]
for (int i=0; i<5; ++i)
{
float w = 0.5f * (w1 + w2);
float3 q = lerp(q0, q1, w);
float2 p = lerp(p0, p1, w);
float k = lerp(k0, k1, w);
float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
float linearSampleDepth = LinearEyeDepth(sampleDepth, _ZBufferParams);
float linearRayDepth = LinearEyeDepth(q.z * k, _ZBufferParams);
float hitDiff = linearRayDepth - linearSampleDepth;
if (hitDiff > 0.0f)
{
w1 = w;
if (hit) hitPos = p;
}
else
{
w2 = w;
}
float thicknessDiff = getThicknessDiff(hitDiff, linearSampleDepth, _ThicknessParams);
float absThicknessDiff = abs(thicknessDiff);
if (!hit && absThicknessDiff < minThicknessDiff)
{
realHit = true;
minThicknessDiff = thicknessDiff;
hitPos = p;
}
}
if (hit || realHit) color = _CameraOpaqueTexture.Sample(sampler_LinearClamp, hitPos).rgb * 0.3f;
}
#elif defined(USE_BINARY_SEARCH)
if (hit)
{
float2 hitPos;
[unroll(5)]
for (int i=0; i<5; ++i)
{
float w = 0.5f * (w1 + w2);
float3 q = lerp(q0, q1, w);
float2 p = lerp(p0, p1, w);
float k = lerp(k0, k1, w);
float sampleDepth = _CameraDepthTexture.Sample(DEPTH_SAMPLER, p).r;
if (q.z * k < sampleDepth)
{
w1 = w;
hitPos = p;
}
else
{
w2 = w;
}
}
color = _CameraOpaqueTexture.Sample(sampler_LinearClamp, hitPos).rgb * 0.3f;
}
#else
if (hit)
{
float2 hitPos = lerp(p0, p1, w1);
color = _CameraOpaqueTexture.Sample(sampler_LinearClamp, hitPos).rgb * 0.3f;
}
#endif
return float4(color, 1.0f);
}
ENDHLSL
SubShader
{
Tags { "RenderType"="Transparent" "Queue"="Transparent" }
LOD 100
Pass
{
HLSLPROGRAM
#pragma vertex vert
#pragma fragment frag
ENDHLSL
}
}
}
Future Optimization
Currently, there is one aspect worth optimizing: controlling the overall step count based on the pixel distance between p0
and p1
. We certainly don’t want to step 64 times for just 10 pixels. However, this is a relatively straightforward task, and I’ll leave it to someone with time to spare. As for random sampling, blurring, and Fresnel effects, let’s consider those when we really need them.
Postscript
This article was translated by Microsoft’s Copilot and I made a few adjustments. What an era we live in!