GPU: Add imageLoadFast and imageStoreFast variants #115195

Jason Fielder · 2023-11-20T17:01:59+01:00

Jason Fielder commented

2023-11-20 17:01:59 +01:00

Add fast image writing and reading variants for additional use
cases. These variants do not perform range checking on values
and should only be used in cases where the written texel is
guaranteed to be in range. This eliminates additional
branching and simplifies shader logic.

Authored by Apple: Michael Parkin-White

Add fast image writing and reading variants for additional use cases. These variants do not perform range checking on values and should only be used in cases where the written texel is guaranteed to be in range. This eliminates additional branching and simplifies shader logic. Authored by Apple: Michael Parkin-White

Jason Fielder added 1 commit 2023-11-20 17:02:11 +01:00

5e50b3daad GPU: Add imageLoadFast and imageStoreFast variants

Add fast image writing and reading variants for additional use
cases. These variants do not perform range checking on values
and should only be used in cases where the written texel is
guaranteed to be in range. This eliminates additional
branching and simplifies shader logic.

Single-channel write variants have also been added to
match single channel write variants exposed by the Metal
backend, reducing temporary register requirements.

Authored by Apple: Michael Parkin-White

Jason Fielder requested review from Clément Foucault 2023-11-20 17:02:17 +01:00

Jason Fielder requested review from Jeroen Bakker 2023-11-20 17:02:22 +01:00

Clément Foucault requested changes 2023-11-21 18:57:51 +01:00

source/blender/gpu/shaders/opengl/glsl_shader_defines.glsl Outdated

						
				@ -15,0 +15,4 @@

				#define imageLoadFast imageLoad

				/* Fast store variant for single-channel writes. Special case which avoids unnecessary vector

				 * pack/unpback. Passthrough in GLSL. */

				#define imageStoreFast_1chFloat(tex, px, val) imageStore(tex, px, vec4(val))

Clément Foucault commented

2023-11-21 18:16:28 +01:00

Implement as function overloads. Declare all variants.

Jason Fielder added 2 commits 2024-01-12 16:49:35 +01:00

04391df10d Merge branch 'main' into GPU_image_fast_op

3ef2fee2e9 Remove single channel functions. Not significantly beneficial but increases code complexity.

Michael Parkin-White commented

2024-01-12 16:54:45 +01:00

First-time contributor

PR updated. Decided to remove single channel ops for simplicity.
Currently also need to run final compilation tests.

Also worth noting that while I have reasoned about each case where the "Fast" variants can be employed, I would be curious of whether any of the assumed cases could be out of bounds?

One key shader to consider would be the ray-denose, wherein I expect it is reasonable to go out of bounds. However, as three texture samples are performed, it would be beneficial to use fast image routines and perform bounds checking externally, at least in Metal.

The other option would be to have split invocations of the loop, i.e. one which would guarantee all samples are internally within bounds, due to maximal sampling radius. THough this can be looked into in a future PR.

PR updated. Decided to remove single channel ops for simplicity. Currently also need to run final compilation tests. Also worth noting that while I have reasoned about each case where the "Fast" variants can be employed, I would be curious of whether any of the assumed cases could be out of bounds? One key shader to consider would be the ray-denose, wherein I expect it is reasonable to go out of bounds. However, as three texture samples are performed, it would be beneficial to use fast image routines and perform bounds checking externally, at least in Metal. The other option would be to have split invocations of the loop, i.e. one which would guarantee all samples are internally within bounds, due to maximal sampling radius. THough this can be looked into in a future PR.

Clément Foucault requested changes 2024-01-12 22:52:54 +01:00

source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_downsample_comp.glsl Outdated

						
				@ -33,3 +33,3 @@

				  vec4 out_color = weighted_sum_array(colors, weights);

				  imageStore(out_color_img, ivec2(gl_GlobalInvocationID.xy), out_color);

				  imageStoreFast(out_color_img, ivec2(gl_GlobalInvocationID.xy), out_color);

Clément Foucault commented

2024-01-12 22:16:48 +01:00

Check this one.

source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_filter_comp.glsl Outdated

						
				@ -168,2 +168,2 @@

				  imageStore(out_color_img, out_texel, median.color);

				  imageStore(out_weight_img, out_texel, vec4(median.weight));

				  imageStoreFast(out_color_img, out_texel, median.color);

				  imageStoreFast(out_weight_img, out_texel, vec4(median.weight));

Clément Foucault commented

2024-01-12 22:17:12 +01:00

Check this one.

Michael Parkin-White commented

2024-03-09 18:48:50 +01:00

First-time contributor

Think this one is good in most cases as dispatch size and texture size are both set to "half_res".

Michael Parkin-White commented

2024-03-09 18:57:09 +01:00

First-time contributor

Nvm actually, also same scenario where sizes can get rounded up to be larger.

source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_gather_comp.glsl Outdated

						
				@ -101,1 +99,3 @@

				  imageStore(out_occlusion_img, out_texel, out_occlusion.xyxy);

				  imageStoreFast(out_color_img, out_texel, out_color);

				  imageStoreFast(out_weight_img, out_texel, vec4(out_weight));

				  imageStoreFast(out_occlusion_img, out_texel, out_occlusion.xyxy);

Clément Foucault commented

2024-01-12 22:17:36 +01:00

Check this one.

source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_hole_fill_comp.glsl Outdated

						
				@ -70,3 +70,2 @@

				  ivec2 out_texel = ivec2(gl_GlobalInvocationID.xy);

				  imageStore(out_color_img, out_texel, out_color);

				  imageStore(out_weight_img, out_texel, vec4(out_weight));

				  imageStoreFast(out_color_img, out_texel, out_color);

Clément Foucault commented

2024-01-12 22:17:49 +01:00

Check this one.

Michael Parkin-White commented

2024-03-09 18:50:01 +01:00

First-time contributor

Same as above, dispatch size appears to be equal to texture resolution in all cases, so looks like a 1:1 mapping within bounds.

Edit: Actually nvm, I see dispatch_gather_size_ gets rounded up to a multiple of DOF_GATHER_GROUP_SIZE, so could be larger.

Same as above, dispatch size appears to be equal to texture resolution in all cases, so looks like a 1:1 mapping within bounds. Edit: Actually nvm, I see `dispatch_gather_size_` gets rounded up to a multiple of DOF_GATHER_GROUP_SIZE, so could be larger.

source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_reduce_comp.glsl

						
				@ -246,2 +246,2 @@

				        imageStore(out_color_lod3_img, texel, color_cache[LOCAL_INDEX]);

				        imageStore(out_coc_lod3_img, texel, vec4(coc_cache[LOCAL_INDEX]));

				        imageStoreFast(out_color_lod3_img, texel, color_cache[LOCAL_INDEX]);

				        imageStoreFast(out_coc_lod3_img, texel, vec4(coc_cache[LOCAL_INDEX]));

Clément Foucault commented

2024-01-12 22:18:28 +01:00

Check this one.

Michael Parkin-White commented

2024-03-09 18:53:26 +01:00

First-time contributor

int2 reduce_size = math::ceil_to_multiple(half_res, int2(DOF_REDUCE_GROUP_SIZE));

reduced_color_tx_ >= half_size

dispatch_reduce_size_ = int3(math::divide_ceil(half_res, int2(DOF_REDUCE_GROUP_SIZE)), 1);

So think this looks good, should also be a 1:1 mapping for each mip cascade.

`int2 reduce_size = math::ceil_to_multiple(half_res, int2(DOF_REDUCE_GROUP_SIZE));` reduced_color_tx_ >= half_size ` dispatch_reduce_size_ = int3(math::divide_ceil(half_res, int2(DOF_REDUCE_GROUP_SIZE)), 1);` So think this looks good, should also be a 1:1 mapping for each mip cascade.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_setup_comp.glsl

						
				@ -44,3 +44,3 @@

				  ivec2 out_texel = ivec2(gl_GlobalInvocationID.xy);

				  vec4 out_color = weighted_sum_array(colors, weights);

				  imageStore(out_color_img, out_texel, out_color);

				  imageStoreFast(out_color_img, out_texel, out_color);

Clément Foucault commented

2024-01-12 22:18:42 +01:00

Check this one.

source/blender/draw/engines/eevee_next/shaders/eevee_depth_of_field_stabilize_comp.glsl Outdated

						
				@ -368,3 +368,3 @@

				  /* Save history for next iteration. Still in YCoCg space with CoC in alpha. */

				  imageStore(out_history_img, src_texel, result.color);

				  imageStoreFast(out_history_img, src_texel, result.color);

Clément Foucault commented

2024-01-12 22:19:11 +01:00

Check this one.

source/blender/draw/engines/eevee_next/shaders/eevee_film_cryptomatte_post_comp.glsl

						
				@ -72,3 +72,3 @@

				    cryptomatte_sort_samples(samples);

				    /* Repeat texture coordinates as the weight can be optimized to a small portion of the film. */

				    float weight = imageLoad(

				    float weight = imageLoadFast(

Clément Foucault commented

2024-01-12 22:20:05 +01:00

Check this one.

source/blender/draw/engines/eevee_next/shaders/eevee_film_lib.glsl

						
				@ -672,3 +672,3 @@

				          uniform_buf.film.display_id == uniform_buf.film.normal_id)

				      {

				        out_color = imageLoad(color_accum_img, ivec3(texel_film, uniform_buf.film.display_id));

				        out_color = imageLoadFast(color_accum_img, ivec3(texel_film, uniform_buf.film.display_id));

Clément Foucault commented

2024-01-12 22:23:11 +01:00

Check this one.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_bilateral_comp.glsl Outdated

						
				@ -91,2 +91,2 @@

				  float variance = imageLoad(in_variance_img, texel_fullres).r;

				  vec3 in_radiance = imageLoad(in_radiance_img, texel_fullres).rgb;

				  float variance = imageLoadFast(in_variance_img, texel_fullres).r;

				  vec3 in_radiance = imageLoadFast(in_radiance_img, texel_fullres).rgb;

Clément Foucault commented

2024-01-12 22:30:42 +01:00

These are fine since out of screen pixels will have no valid closures.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_bilateral_comp.glsl Outdated

						
				@ -123,3 +123,3 @@

				    ivec3 sample_tile = ivec3(sample_texel / RAYTRACE_GROUP_SIZE, closure_index);

				    /* Make sure the sample has been processed and do not contain garbage data. */

				    if (imageLoad(tile_mask_img, sample_tile).r == 0u) {

				    if (imageLoadFast(tile_mask_img, sample_tile).r == 0u) {

Clément Foucault commented

2024-01-12 22:31:20 +01:00

This one is not safe.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_bilateral_comp.glsl Outdated

						
				@ -136,3 +136,3 @@

				    }

				    vec3 radiance = imageLoad(in_radiance_img, sample_texel).rgb;

				    vec3 radiance = imageLoadFast(in_radiance_img, sample_texel).rgb;

Clément Foucault commented

2024-01-12 22:33:38 +01:00

This one is safe is safe if we consider the previous check to work.

But looking at it, I think it should be sample_depth == 0.0 || sample_depth == 1.0 to discard out of view and background pixels.

This one is safe is safe if we consider the previous check to work. But looking at it, I think it should be `sample_depth == 0.0 || sample_depth == 1.0` to discard out of view and background pixels.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_bilateral_comp.glsl Outdated

						
				@ -164,3 +164,3 @@

				  out_radiance = from_accumulation_space(out_radiance);

				  imageStore(out_radiance_img, texel_fullres, vec4(out_radiance, 0.0));

				  imageStoreFast(out_radiance_img, texel_fullres, vec4(out_radiance, 0.0));

Clément Foucault commented

2024-01-12 22:34:01 +01:00

Safe

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_spatial_comp.glsl Outdated

						
				@ -76,3 +76,3 @@

				#endif

				  if (do_skip_denoise) {

				    imageStore(out_radiance_img, texel_fullres, imageLoad(ray_radiance_img, texel));

				    imageStoreFast(out_radiance_img, texel_fullres, imageLoadFast(ray_radiance_img, texel));

Clément Foucault commented

2024-01-12 22:35:27 +01:00

This isn't safe. Given this is a fast path, I think it doesn't really matter to use fast variant here.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_spatial_comp.glsl Outdated

						
				@ -99,3 +99,3 @@

				      ivec3 sample_tile = ivec3(tile_coord_neighbor, closure_index);

				      uint tile_mask = imageLoad(tile_mask_img, sample_tile).r;

				      uint tile_mask = imageLoadFast(tile_mask_img, sample_tile).r;

Clément Foucault commented

2024-01-12 22:36:46 +01:00

Safe given the above check.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_spatial_comp.glsl Outdated

						
				@ -193,2 +192,2 @@

				  imageStore(out_variance_img, texel_fullres, vec4(hit_variance));

				  imageStore(out_hit_depth_img, texel_fullres, vec4(hit_depth));

				  imageStoreFast(out_radiance_img, texel_fullres, vec4(radiance_accum, 0.0));

				  imageStoreFast(out_variance_img, texel_fullres, vec4(hit_variance));

Clément Foucault commented

2024-01-12 22:37:12 +01:00

Safe.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_denoise_temporal_comp.glsl

						
				@ -177,3 +177,2 @@

				  float in_variance = imageLoad(in_variance_img, texel_fullres).r;

				  vec3 in_radiance = imageLoad(in_radiance_img, texel_fullres).rgb;

				  float in_variance = imageLoadFast(in_variance_img, texel_fullres).r;

Clément Foucault commented

2024-01-12 22:38:11 +01:00

All these are unsafe. Add check on texel_fullres` at the top of the function.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_ray_generate_comp.glsl Outdated

						
				@ -27,3 +27,3 @@

				  bool valid_pixel = closure_index < gbuf.closure_count;

				  if (!valid_pixel) {

				    imageStore(out_ray_data_img, texel, vec4(0.0));

				    imageStoreFast(out_ray_data_img, texel, vec4(0.0));

Clément Foucault commented

2024-01-12 22:39:31 +01:00

Unsafe.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_generate_comp.glsl Outdated

						
				@ -43,3 +43,3 @@

				   * Strangely it does not correspond to the IEEE spec. */

				  float inv_pdf = (samp.pdf == 0.0) ? 0.0 : max(6e-8, 1.0 / samp.pdf);

				  imageStore(out_ray_data_img, texel, vec4(samp.direction, inv_pdf));

				  imageStoreFast(out_ray_data_img, texel, vec4(samp.direction, inv_pdf));

Clément Foucault commented

2024-01-12 22:39:38 +01:00

Safe.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_trace_fallback_comp.glsl Outdated

						
				@ -26,3 +26,2 @@

				  vec4 ray_data = imageLoad(ray_data_img, texel);

				  float ray_pdf_inv = ray_data.w;

				  vec4 ray_data_im = imageLoadFast(ray_data_img, texel);

Clément Foucault commented

2024-01-12 22:42:22 +01:00

Unsafe. Check texel_fullres and texel, and early exit.

Unsafe. Check `texel_fullres` and `texel`, and early exit.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_trace_planar_comp.glsl Outdated

						
				@ -22,13 +22,13 @@ void main()

				  uvec2 tile_coord = unpackUvec2x16(tiles_coord_buf[gl_WorkGroupID.x]);

				  ivec2 texel = ivec2(gl_LocalInvocationID.xy + tile_coord * tile_size);

				  vec4 ray_data = imageLoad(ray_data_img, texel);

Clément Foucault commented

2024-01-12 22:44:43 +01:00

Unsafe. Add check on texel and early exit.

Unsafe. Add check on `texel` and early exit.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_trace_planar_comp.glsl Outdated

						
				@ -24,3 +24,2 @@

				  vec4 ray_data = imageLoad(ray_data_img, texel);

				  float ray_pdf_inv = ray_data.w;

				  vec4 ray_data_im = imageLoadFast(ray_data_img, texel);

Clément Foucault commented

2024-01-12 22:43:38 +01:00

Why the rename?

Michael Parkin-White commented

2024-03-09 19:10:25 +01:00

First-time contributor

Meant to reply to this before, ray_data is a reserved keyword in Metal for MetalRT. This triggers an error if compiling with Metal 3.0 which occurs if the language features are available.

Meant to reply to this before, `ray_data` is a reserved keyword in Metal for MetalRT. This triggers an error if compiling with Metal 3.0 which occurs if the language features are available.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_trace_screen_comp.glsl Outdated

						
				@ -19,8 +19,8 @@ void main()

				  uvec2 tile_coord = unpackUvec2x16(tiles_coord_buf[gl_WorkGroupID.x]);

				  ivec2 texel = ivec2(gl_LocalInvocationID.xy + tile_coord * tile_size);

				  vec4 ray_data = imageLoad(ray_data_img, texel);

Clément Foucault commented

2024-01-12 22:45:21 +01:00

Unsafe. Add check on texel and early exit.

source/blender/draw/engines/eevee_next/shaders/eevee_ray_trace_screen_comp.glsl Outdated

						
				@ -21,3 +21,2 @@

				  vec4 ray_data = imageLoad(ray_data_img, texel);

				  float ray_pdf_inv = ray_data.w;

				  vec4 ray_data_im = imageLoadFast(ray_data_img, texel);

Clément Foucault commented

2024-01-12 22:45:17 +01:00

Same rename.

source/blender/draw/engines/eevee_next/shaders/eevee_renderpass_lib.glsl

						
				@ -8,3 +8,3 @@

				  if (id >= 0) {

				    ivec2 texel = ivec2(gl_FragCoord.xy);

				    imageStore(rp_color_img, ivec3(texel, id), color);

				    imageStoreFast(rp_color_img, ivec3(texel, id), color);

Clément Foucault commented

2024-01-12 22:46:07 +01:00

Should be fine if this is only used for fragment shader.

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_finalize_comp.glsl Outdated

						
				@ -135,2 +135,4 @@

				          view_infos_buf[view_index].winmat = winmat;

				          view_infos_buf[view_index].wininv = inverse(winmat);

				          /* NOTE: We may not end up using this due to potential imprecision. */

				          view_infos_buf[view_index].persmat_sh = winmat * tilemap_data.viewmat;

Clément Foucault commented

2024-01-12 22:47:35 +01:00

What is that?

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_finalize_comp.glsl Outdated

						
				@ -182,3 +184,3 @@

				  /* Store the highest LOD valid page for rendering. */

				  uint tile_packed = (valid_tile_index != -1) ? tiles_buf[valid_tile_index] : SHADOW_NO_DATA;

				  imageStore(tilemaps_img, atlas_texel, uvec4(tile_packed));

				  imageStoreFast(tilemaps_img, atlas_texel, uvec4(tile_packed));

Clément Foucault commented

2024-01-12 22:48:01 +01:00

Safe.

source/blender/draw/engines/eevee_next/shaders/eevee_subsurface_setup_comp.glsl

						
				@ -34,3 +34,2 @@

				    imageStore(radiance_img, texel, vec4(radiance, 0.0));

				    imageStore(object_id_img, texel, uvec4(gbuf.object_id));

				    imageStoreFast(radiance_img, texel, vec4(radiance, 0.0));

Clément Foucault commented

2024-01-12 22:49:15 +01:00

This branch path is Safe.

source/blender/draw/engines/eevee_next/shaders/eevee_subsurface_setup_comp.glsl Outdated

						
				@ -50,3 +50,3 @@

				  else {

				    /* No need to write radiance_img since the radiance won't be used at all. */

				    imageStore(object_id_img, texel, uvec4(0));

				    imageStoreFast(object_id_img, texel, uvec4(0));

Clément Foucault commented

2024-01-12 22:48:51 +01:00

This is unsafe.

Clément Foucault added the

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

GPU: Add imageLoadFast and imageStoreFast variants #115195

Checkout