GPU: Add imageLoadFast and imageStoreFast variants #115195

Open
Jason Fielder wants to merge 9 commits from Jason-Fielder/blender:GPU_image_fast_op into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

Add fast image writing and reading variants for additional use
cases. These variants do not perform range checking on values
and should only be used in cases where the written texel is
guaranteed to be in range. This eliminates additional
branching and simplifies shader logic.

Authored by Apple: Michael Parkin-White

Add fast image writing and reading variants for additional use cases. These variants do not perform range checking on values and should only be used in cases where the written texel is guaranteed to be in range. This eliminates additional branching and simplifies shader logic. Authored by Apple: Michael Parkin-White
Jason Fielder added 1 commit 2023-11-20 17:02:11 +01:00
5e50b3daad GPU: Add imageLoadFast and imageStoreFast variants
Add fast image writing and reading variants for additional use
cases. These variants do not perform range checking on values
and should only be used in cases where the written texel is
guaranteed to be in range. This eliminates additional
branching and simplifies shader logic.

Single-channel write variants have also been added to
match single channel write variants exposed by the Metal
backend, reducing temporary register requirements.

Authored by Apple: Michael Parkin-White
Jason Fielder requested review from Clément Foucault 2023-11-20 17:02:17 +01:00
Jason Fielder requested review from Jeroen Bakker 2023-11-20 17:02:22 +01:00
Clément Foucault requested changes 2023-11-21 18:57:51 +01:00
@ -15,0 +15,4 @@
#define imageLoadFast imageLoad
/* Fast store variant for single-channel writes. Special case which avoids unnecessary vector
* pack/unpback. Passthrough in GLSL. */
#define imageStoreFast_1chFloat(tex, px, val) imageStore(tex, px, vec4(val))

Implement as function overloads. Declare all variants.

Implement as function overloads. Declare all variants.
Jason Fielder added 2 commits 2024-01-12 16:49:35 +01:00
First-time contributor

PR updated. Decided to remove single channel ops for simplicity.
Currently also need to run final compilation tests.

Also worth noting that while I have reasoned about each case where the "Fast" variants can be employed, I would be curious of whether any of the assumed cases could be out of bounds?

One key shader to consider would be the ray-denose, wherein I expect it is reasonable to go out of bounds. However, as three texture samples are performed, it would be beneficial to use fast image routines and perform bounds checking externally, at least in Metal.

The other option would be to have split invocations of the loop, i.e. one which would guarantee all samples are internally within bounds, due to maximal sampling radius. THough this can be looked into in a future PR.

PR updated. Decided to remove single channel ops for simplicity. Currently also need to run final compilation tests. Also worth noting that while I have reasoned about each case where the "Fast" variants can be employed, I would be curious of whether any of the assumed cases could be out of bounds? One key shader to consider would be the ray-denose, wherein I expect it is reasonable to go out of bounds. However, as three texture samples are performed, it would be beneficial to use fast image routines and perform bounds checking externally, at least in Metal. The other option would be to have split invocations of the loop, i.e. one which would guarantee all samples are internally within bounds, due to maximal sampling radius. THough this can be looked into in a future PR.
Clément Foucault requested changes 2024-01-12 22:52:54 +01:00
@ -33,3 +33,3 @@
vec4 out_color = weighted_sum_array(colors, weights);
imageStore(out_color_img, ivec2(gl_GlobalInvocationID.xy), out_color);
imageStoreFast(out_color_img, ivec2(gl_GlobalInvocationID.xy), out_color);

Check this one.

Check this one.
@ -168,2 +168,2 @@
imageStore(out_color_img, out_texel, median.color);
imageStore(out_weight_img, out_texel, vec4(median.weight));
imageStoreFast(out_color_img, out_texel, median.color);
imageStoreFast(out_weight_img, out_texel, vec4(median.weight));

Check this one.

Check this one.
First-time contributor

Think this one is good in most cases as dispatch size and texture size are both set to "half_res".

Think this one is good in most cases as dispatch size and texture size are both set to "half_res".
First-time contributor

Nvm actually, also same scenario where sizes can get rounded up to be larger.

Nvm actually, also same scenario where sizes can get rounded up to be larger.
@ -101,1 +99,3 @@
imageStore(out_occlusion_img, out_texel, out_occlusion.xyxy);
imageStoreFast(out_color_img, out_texel, out_color);
imageStoreFast(out_weight_img, out_texel, vec4(out_weight));
imageStoreFast(out_occlusion_img, out_texel, out_occlusion.xyxy);

Check this one.

Check this one.
@ -70,3 +70,2 @@
ivec2 out_texel = ivec2(gl_GlobalInvocationID.xy);
imageStore(out_color_img, out_texel, out_color);
imageStore(out_weight_img, out_texel, vec4(out_weight));
imageStoreFast(out_color_img, out_texel, out_color);

Check this one.

Check this one.
First-time contributor

Same as above, dispatch size appears to be equal to texture resolution in all cases, so looks like a 1:1 mapping within bounds.

Edit: Actually nvm, I see dispatch_gather_size_ gets rounded up to a multiple of DOF_GATHER_GROUP_SIZE, so could be larger.

Same as above, dispatch size appears to be equal to texture resolution in all cases, so looks like a 1:1 mapping within bounds. Edit: Actually nvm, I see `dispatch_gather_size_` gets rounded up to a multiple of DOF_GATHER_GROUP_SIZE, so could be larger.
@ -246,2 +246,2 @@
imageStore(out_color_lod3_img, texel, color_cache[LOCAL_INDEX]);
imageStore(out_coc_lod3_img, texel, vec4(coc_cache[LOCAL_INDEX]));
imageStoreFast(out_color_lod3_img, texel, color_cache[LOCAL_INDEX]);
imageStoreFast(out_coc_lod3_img, texel, vec4(coc_cache[LOCAL_INDEX]));

Check this one.

Check this one.
First-time contributor

int2 reduce_size = math::ceil_to_multiple(half_res, int2(DOF_REDUCE_GROUP_SIZE));

reduced_color_tx_ >= half_size

dispatch_reduce_size_ = int3(math::divide_ceil(half_res, int2(DOF_REDUCE_GROUP_SIZE)), 1);

So think this looks good, should also be a 1:1 mapping for each mip cascade.

`int2 reduce_size = math::ceil_to_multiple(half_res, int2(DOF_REDUCE_GROUP_SIZE));` reduced_color_tx_ >= half_size ` dispatch_reduce_size_ = int3(math::divide_ceil(half_res, int2(DOF_REDUCE_GROUP_SIZE)), 1);` So think this looks good, should also be a 1:1 mapping for each mip cascade.
fclem marked this conversation as resolved
@ -44,3 +44,3 @@
ivec2 out_texel = ivec2(gl_GlobalInvocationID.xy);
vec4 out_color = weighted_sum_array(colors, weights);
imageStore(out_color_img, out_texel, out_color);
imageStoreFast(out_color_img, out_texel, out_color);

Check this one.

Check this one.
@ -368,3 +368,3 @@
/* Save history for next iteration. Still in YCoCg space with CoC in alpha. */
imageStore(out_history_img, src_texel, result.color);
imageStoreFast(out_history_img, src_texel, result.color);

Check this one.

Check this one.
@ -72,3 +72,3 @@
cryptomatte_sort_samples(samples);
/* Repeat texture coordinates as the weight can be optimized to a small portion of the film. */
float weight = imageLoad(
float weight = imageLoadFast(

Check this one.

Check this one.
@ -672,3 +672,3 @@
uniform_buf.film.display_id == uniform_buf.film.normal_id)
{
out_color = imageLoad(color_accum_img, ivec3(texel_film, uniform_buf.film.display_id));
out_color = imageLoadFast(color_accum_img, ivec3(texel_film, uniform_buf.film.display_id));

Check this one.

Check this one.
@ -91,2 +91,2 @@
float variance = imageLoad(in_variance_img, texel_fullres).r;
vec3 in_radiance = imageLoad(in_radiance_img, texel_fullres).rgb;
float variance = imageLoadFast(in_variance_img, texel_fullres).r;
vec3 in_radiance = imageLoadFast(in_radiance_img, texel_fullres).rgb;

These are fine since out of screen pixels will have no valid closures.

These are fine since out of screen pixels will have no valid closures.
@ -123,3 +123,3 @@
ivec3 sample_tile = ivec3(sample_texel / RAYTRACE_GROUP_SIZE, closure_index);
/* Make sure the sample has been processed and do not contain garbage data. */
if (imageLoad(tile_mask_img, sample_tile).r == 0u) {
if (imageLoadFast(tile_mask_img, sample_tile).r == 0u) {

This one is not safe.

This one is not safe.
@ -136,3 +136,3 @@
}
vec3 radiance = imageLoad(in_radiance_img, sample_texel).rgb;
vec3 radiance = imageLoadFast(in_radiance_img, sample_texel).rgb;

This one is safe is safe if we consider the previous check to work.

But looking at it, I think it should be sample_depth == 0.0 || sample_depth == 1.0 to discard out of view and background pixels.

This one is safe is safe if we consider the previous check to work. But looking at it, I think it should be `sample_depth == 0.0 || sample_depth == 1.0` to discard out of view and background pixels.
@ -164,3 +164,3 @@
out_radiance = from_accumulation_space(out_radiance);
imageStore(out_radiance_img, texel_fullres, vec4(out_radiance, 0.0));
imageStoreFast(out_radiance_img, texel_fullres, vec4(out_radiance, 0.0));

Safe

Safe
@ -76,3 +76,3 @@
#endif
if (do_skip_denoise) {
imageStore(out_radiance_img, texel_fullres, imageLoad(ray_radiance_img, texel));
imageStoreFast(out_radiance_img, texel_fullres, imageLoadFast(ray_radiance_img, texel));

This isn't safe. Given this is a fast path, I think it doesn't really matter to use fast variant here.

This isn't safe. Given this is a fast path, I think it doesn't really matter to use fast variant here.
@ -99,3 +99,3 @@
ivec3 sample_tile = ivec3(tile_coord_neighbor, closure_index);
uint tile_mask = imageLoad(tile_mask_img, sample_tile).r;
uint tile_mask = imageLoadFast(tile_mask_img, sample_tile).r;

Safe given the above check.

Safe given the above check.
@ -193,2 +192,2 @@
imageStore(out_variance_img, texel_fullres, vec4(hit_variance));
imageStore(out_hit_depth_img, texel_fullres, vec4(hit_depth));
imageStoreFast(out_radiance_img, texel_fullres, vec4(radiance_accum, 0.0));
imageStoreFast(out_variance_img, texel_fullres, vec4(hit_variance));

Safe.

Safe.
@ -177,3 +177,2 @@
float in_variance = imageLoad(in_variance_img, texel_fullres).r;
vec3 in_radiance = imageLoad(in_radiance_img, texel_fullres).rgb;
float in_variance = imageLoadFast(in_variance_img, texel_fullres).r;

All these are unsafe. Add check on texel_fullres` at the top of the function.

All these are unsafe. Add check on texel_fullres` at the top of the function.
fclem marked this conversation as resolved
@ -27,3 +27,3 @@
bool valid_pixel = closure_index < gbuf.closure_count;
if (!valid_pixel) {
imageStore(out_ray_data_img, texel, vec4(0.0));
imageStoreFast(out_ray_data_img, texel, vec4(0.0));

Unsafe.

Unsafe.
@ -43,3 +43,3 @@
* Strangely it does not correspond to the IEEE spec. */
float inv_pdf = (samp.pdf == 0.0) ? 0.0 : max(6e-8, 1.0 / samp.pdf);
imageStore(out_ray_data_img, texel, vec4(samp.direction, inv_pdf));
imageStoreFast(out_ray_data_img, texel, vec4(samp.direction, inv_pdf));

Safe.

Safe.
@ -26,3 +26,2 @@
vec4 ray_data = imageLoad(ray_data_img, texel);
float ray_pdf_inv = ray_data.w;
vec4 ray_data_im = imageLoadFast(ray_data_img, texel);

Unsafe. Check texel_fullres and texel, and early exit.

Unsafe. Check `texel_fullres` and `texel`, and early exit.
@ -22,13 +22,13 @@ void main()
uvec2 tile_coord = unpackUvec2x16(tiles_coord_buf[gl_WorkGroupID.x]);
ivec2 texel = ivec2(gl_LocalInvocationID.xy + tile_coord * tile_size);
vec4 ray_data = imageLoad(ray_data_img, texel);

Unsafe. Add check on texel and early exit.

Unsafe. Add check on `texel` and early exit.
@ -24,3 +24,2 @@
vec4 ray_data = imageLoad(ray_data_img, texel);
float ray_pdf_inv = ray_data.w;
vec4 ray_data_im = imageLoadFast(ray_data_img, texel);

Why the rename?

Why the rename?
First-time contributor

Meant to reply to this before, ray_data is a reserved keyword in Metal for MetalRT. This triggers an error if compiling with Metal 3.0 which occurs if the language features are available.

Meant to reply to this before, `ray_data` is a reserved keyword in Metal for MetalRT. This triggers an error if compiling with Metal 3.0 which occurs if the language features are available.
@ -19,8 +19,8 @@ void main()
uvec2 tile_coord = unpackUvec2x16(tiles_coord_buf[gl_WorkGroupID.x]);
ivec2 texel = ivec2(gl_LocalInvocationID.xy + tile_coord * tile_size);
vec4 ray_data = imageLoad(ray_data_img, texel);

Unsafe. Add check on texel and early exit.

Unsafe. Add check on texel and early exit.
@ -21,3 +21,2 @@
vec4 ray_data = imageLoad(ray_data_img, texel);
float ray_pdf_inv = ray_data.w;
vec4 ray_data_im = imageLoadFast(ray_data_img, texel);

Same rename.

Same rename.
@ -8,3 +8,3 @@
if (id >= 0) {
ivec2 texel = ivec2(gl_FragCoord.xy);
imageStore(rp_color_img, ivec3(texel, id), color);
imageStoreFast(rp_color_img, ivec3(texel, id), color);

Should be fine if this is only used for fragment shader.

Should be fine if this is only used for fragment shader.
@ -135,2 +135,4 @@
view_infos_buf[view_index].winmat = winmat;
view_infos_buf[view_index].wininv = inverse(winmat);
/* NOTE: We may not end up using this due to potential imprecision. */
view_infos_buf[view_index].persmat_sh = winmat * tilemap_data.viewmat;

What is that?

What is that?
@ -182,3 +184,3 @@
/* Store the highest LOD valid page for rendering. */
uint tile_packed = (valid_tile_index != -1) ? tiles_buf[valid_tile_index] : SHADOW_NO_DATA;
imageStore(tilemaps_img, atlas_texel, uvec4(tile_packed));
imageStoreFast(tilemaps_img, atlas_texel, uvec4(tile_packed));

Safe.

Safe.
@ -34,3 +34,2 @@
imageStore(radiance_img, texel, vec4(radiance, 0.0));
imageStore(object_id_img, texel, uvec4(gbuf.object_id));
imageStoreFast(radiance_img, texel, vec4(radiance, 0.0));

This branch path is Safe.

This branch path is Safe.
@ -50,3 +50,3 @@
else {
/* No need to write radiance_img since the radiance won't be used at all. */
imageStore(object_id_img, texel, uvec4(0));
imageStoreFast(object_id_img, texel, uvec4(0));

This is unsafe.

This is unsafe.
Clément Foucault added the
Interest
EEVEE
Interest
Metal
labels 2024-01-12 22:55:47 +01:00
Clément Foucault added this to the 4.1 milestone 2024-01-12 22:55:55 +01:00
Jason Fielder added 4 commits 2024-03-09 19:19:29 +01:00
Jason Fielder added 1 commit 2024-03-10 14:28:15 +01:00
Jason Fielder added 1 commit 2024-03-11 17:57:10 +01:00
First-time contributor

I believe this one should now have all feedback address. Apologies on the delay for those last few bits!

I believe this one should now have all feedback address. Apologies on the delay for those last few bits!
Jeroen Bakker removed this from the 4.1 milestone 2024-03-13 16:35:58 +01:00
This pull request has changes conflicting with the target branch.
  • source/blender/draw/engines/eevee_next/shaders/eevee_ambient_occlusion_pass_comp.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_cryptomatte_lib.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_reduce_comp.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_film_frag.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_film_lib.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_ray_tile_classify_comp.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_ray_trace_planar_comp.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_finalize_comp.glsl
  • source/blender/draw/engines/eevee_next/shaders/eevee_volume_material_comp.glsl

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u GPU_image_fast_op:Jason-Fielder-GPU_image_fast_op
git checkout Jason-Fielder-GPU_image_fast_op
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#115195
No description provided.