Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon #111283

Jason Fielder · 2023-08-18T22:29:42+02:00

Jason Fielder commented

2023-08-18 22:29:42 +02:00

WIP PR for optimization of EEVEE Next's Virtual Shadow Maps for
TBDRs. The core of these optimizations lie in eliminating use of
atomic shadow atlas writes and instead utilise tile memory to
perform depth accumulation as a secondary pass once all
geometry updates for a given shadow view have been updated.

This also allows use of fast on-tile depth testing/sorting, reducing
overdraw and redundant fragment operations, while also allowing
for tile indirection calculations to be offloaded into the vertex
shader to increase fragment storage efficiency and throughput.

Authored by Apple: Michael Parkin-White

Further work required:

Optimization: Consider replacing U32 shadow atlas with F32.
Add pass before shadow geometry raster to ensure only tiles with active geometry can raster.
- Initially clear all other tiles to zero instead.
- Use same quad generation shader to clear active update regions to 1.0 allowing depth contents to be rendered.
Determine if there is a way to reduce layer usage in framebuffer or accumulation quad geometry dispatch in order to reduce intermediate memory pressure of large virtual tile bin. (To be resolved in follow-up patch).
Benchmark other phases in shadow gen. Optimize workgroup sizes for each sub-phase where appropriate.
Remove redundant bindings for surface shadow shader (e.g. texture atlas, tile info bufs etc) as subtle static resource cost.
Remove SHADOW_PAGE_CLEAR pass in favour of storing cleared values within accumulation pass.

Other potential work:

Replace accumulation phase quad generation with GPU-driven solution (likely unnecessary as the vertex phase of the accumulation pass is not very expensive).

WIP PR for optimization of EEVEE Next's Virtual Shadow Maps for TBDRs. The core of these optimizations lie in eliminating use of atomic shadow atlas writes and instead utilise tile memory to perform depth accumulation as a secondary pass once all geometry updates for a given shadow view have been updated. This also allows use of fast on-tile depth testing/sorting, reducing overdraw and redundant fragment operations, while also allowing for tile indirection calculations to be offloaded into the vertex shader to increase fragment storage efficiency and throughput. Authored by Apple: Michael Parkin-White ---------------------- Further work required: - [x] Optimization: Consider replacing U32 shadow atlas with F32. - [x] Add pass before shadow geometry raster to ensure only tiles with active geometry can raster. - Initially clear all other tiles to zero instead. - Use same quad generation shader to clear active update regions to 1.0 allowing depth contents to be rendered. - [x] Determine if there is a way to reduce layer usage in framebuffer or accumulation quad geometry dispatch in order to reduce intermediate memory pressure of large virtual tile bin. (To be resolved in follow-up patch). - [x] Benchmark other phases in shadow gen. Optimize workgroup sizes for each sub-phase where appropriate. - [x] Remove redundant bindings for surface shadow shader (e.g. texture atlas, tile info bufs etc) as subtle static resource cost. - [x] Remove SHADOW_PAGE_CLEAR pass in favour of storing cleared values within accumulation pass. Other potential work: - [x] Replace accumulation phase quad generation with GPU-driven solution (likely unnecessary as the vertex phase of the accumulation pass is not very expensive). -----------------------

Jason Fielder added 1 commit 2023-08-18 22:29:55 +02:00

f4827c8ff2 Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon

Opening WIP PR for optimization and implementation of EEVEE Next's
Virtual Shadow Maps for TBDRs. The core of these optimizations lie in
eliminating use of atomic shadow atlas writes and instead utilise tile
memory to perform depth accumulation as a secondary pass once
all geometry updates for a given shadow view have been updated.

This also allows use of fast on-tile depth testing/sorting, reducing
overdraw and redundant fragment operations, while also allowing
for tile indirection calculations to be offloaded into the vertex
shader to increase fragment storage efficiency and throughput.

Authored by Apple: Michael Parkin-White

Jason Fielder requested review from Clément Foucault 2023-08-18 22:30:46 +02:00

Clément Foucault requested changes 2023-08-19 16:13:26 +02:00

Clément Foucault left a comment

I would say it is better to split this patch into :

Raster order group.
Memoryless texture.
Load Store changes.
Out of bounds fixes for the shadow pipeline.
The actual shadow implementation.

This way it is easier to discuss about each separate topic. Some are easier to accept than others.

But I left already a few comment about code organization.

Seems there is some formatting differences on mtl_shader_defines.msl and on some other files.

You can format using make format in the blender source folder (the git repo root folder) so that it uses the clang-format from the lib folder.

I would say it is better to split this patch into : - Raster order group. - Memoryless texture. - Load Store changes. - Out of bounds fixes for the shadow pipeline. - The actual shadow implementation. This way it is easier to discuss about each separate topic. Some are easier to accept than others. But I left already a few comment about code organization. Seems there is some formatting differences on `mtl_shader_defines.msl` and on some other files. You can format using `make format` in the `blender` source folder (the git repo root folder) so that it uses the clang-format from the lib folder.

source/blender/draw/CMakeLists.txt Outdated

						
				@ -573,6 +573,8 @@ set(GLSL_SRC

				  engines/eevee_next/shaders/eevee_surf_forward_frag.glsl

				  engines/eevee_next/shaders/eevee_surf_lib.glsl

				  engines/eevee_next/shaders/eevee_surf_shadow_frag.glsl

				  engines/eevee_next/shaders/eevee_surf_shadow_accum_vert.glsl

Clément Foucault commented

2023-08-19 15:34:19 +02:00

This should be eevee_shadow_page_write_frag/vert.glsl.
The eevee_surf_* are fragment shaders that can match the eevee_geom_* vertex shaders.

This should be `eevee_shadow_page_write_frag/vert.glsl`. The `eevee_surf_*` are fragment shaders that can match the `eevee_geom_*` vertex shaders.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/eevee_pipeline.cc Outdated

						
				@ -160,0 +165,4 @@

				#ifdef WITH_METAL_BACKEND

				  /* Perform accumulation step for storing final tile depth results back to shadow atlas.  */

				  accum_ps_.init();

Clément Foucault commented

2023-08-19 15:35:39 +02:00

This is not an accumulation pass. It just copies the result to the atlas. It should be renamed page_write_ps and moved to the shadow module instead of being in the pipeline.

Also no need for the Camera UBO, the utility texture etc... verify all resources are needed.

This is not an accumulation pass. It just copies the result to the atlas. It should be renamed `page_write_ps` and moved to the shadow module instead of being in the pipeline. Also no need for the Camera UBO, the utility texture etc... verify all resources are needed.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/eevee_pipeline.hh Outdated

						
				@ -102,6 +102,9 @@ class ShadowPipeline {

				  Instance &inst_;

				  PassMain surface_ps_ = {"Shadow.Surface"};

				#ifdef WITH_METAL_BACKEND

Clément Foucault commented

2023-08-19 15:41:44 +02:00

I wouldn't guard any of this behind the WITH_METAL_BACKEND as it would be easy to break.
I would even go to the extend to support an emulated path for other platform for checking this feature works.
Maybe the only thing that need ifdef is the shader for now.

Making it clear it is a separate technique in a subsection of the ShadowModule class would be good.

I wouldn't guard any of this behind the `WITH_METAL_BACKEND` as it would be easy to break. I would even go to the extend to support an emulated path for other platform for checking this feature works. Maybe the only thing that need `ifdef` is the shader for now. Making it clear it is a separate technique in a subsection of the ShadowModule class would be good.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/eevee_shadow.cc Outdated

						
				@ -1175,2 +1201,4 @@

				      shadow_multi_view_.compute_procedural_bounds();

				#ifdef WITH_METAL_BACKEND

				      /* Specify load-store config right before render to ensure flags are retained. */

Clément Foucault commented

2023-08-19 15:22:45 +02:00

From an API perspective, wouldn't it be better to have this stored as a state of the framebuffer?

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/eevee_shadow.hh Outdated

						
				@ -259,2 +259,3 @@

				  /** Framebuffer with the atlas_tx attached. */

				  Framebuffer render_fb_;

				#ifdef WITH_METAL_BACKEND

				  Framebuffer render_fb_ = {"shadow_write_framebuffer"};

Clément Foucault commented

2023-08-19 15:26:22 +02:00

This shouldn't be in the #ifdef block.

This shouldn't be in the `#ifdef` block.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_init_comp.glsl Outdated

						
				@ -12,3 +12,3 @@

				#pragma BLENDER_REQUIRE(eevee_shadow_tilemap_lib.glsl)

				shared bool directional_range_changed;

				shared uint directional_range_changed;

Clément Foucault commented

2023-08-19 15:32:30 +02:00

I did this change a few days ago. I did not remember this was forbidden. I now have ammended our GLSL style guide to include this.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_init_comp.glsl Outdated

						
				@ -69,3 +69,3 @@

				  ivec2 tile_shifted = tile_co + tilemap.grid_shift;

				  /* Ensure value is shifted into positive range to avoid modulo on negative. */

				  ivec2 tile_wrapped = ivec2((ivec2(SHADOW_TILEMAP_RES) + tile_shifted) % SHADOW_TILEMAP_RES);

				  ivec2 tile_wrapped = ivec2((ivec2(SHADOW_TILEMAP_RES * SHADOW_TILEMAP_PER_ROW) + tile_shifted) %

Clément Foucault commented

2023-08-19 14:00:39 +02:00

This is unlikely to be correct and misleading. It might be prefereable to clamp tilemap.grid_shift to [-SHADOW_TILEMAP_RES, SHADOW_TILEMAP_RES] range to compute tile_shifted. Higher values don't matter.

This is unlikely to be correct and misleading. It might be prefereable to clamp `tilemap.grid_shift` to `[-SHADOW_TILEMAP_RES, SHADOW_TILEMAP_RES]` range to compute `tile_shifted`. Higher values don't matter.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_surf_shadow_accum_vert.glsl Outdated

						
				@ -0,0 +74,4 @@

				  /** Interpolate output texel  */

				  /* Using bitwise ops is way faster than integer ops. */

				  const int page_shift = SHADOW_PAGE_LOD;

Clément Foucault commented

2023-08-19 16:08:33 +02:00

Unused

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_surf_shadow_accum_vert.glsl Outdated

						
				@ -0,0 +94,4 @@

				  /* NOTE: To avoid redundant writes, we still enable the depth test and configure the accumulation

				   * pass quad depth such that only the fragments updated during the surface depth pass will run.

				   *

				   * We use `DRW_STATE_DEPTH_GREATER_EQUAL` such that we only run the fragment shader if the

Clément Foucault commented

2023-08-19 16:08:23 +02:00

I guess you can do that because you kept the clear dispatch. I'm pretty sure it's more efficient to just copy the whole tile and not clear the atlas.

Michael Parkin-White commented

2023-08-24 13:56:40 +02:00

First-time contributor

Yeah, following the later update, clearing the full tile and writing this out in full and removing the compute-based clear pass is indeed quite a bit faster.

In the updated version, I've added a comment distinguishing the two options, as it may be important to have contextual information if ever anything needs changing, i.e. knowing that if the compute clear pass ever needs to be used for whatever reason, then the depth flag can be changed back to skip writes that are not needed.

Yeah, following the later update, clearing the full tile and writing this out in full and removing the compute-based clear pass is indeed quite a bit faster. In the updated version, I've added a comment distinguishing the two options, as it may be important to have contextual information if ever anything needs changing, i.e. knowing that if the compute clear pass ever needs to be used for whatever reason, then the depth flag can be changed back to skip writes that are not needed.

fclem marked this conversation as resolved

Jason Fielder added 2 commits 2023-08-21 20:42:58 +02:00

ffea964761 Ensure createinfo's using gpu_Layer are correctly tagged

8bdee8bdab Additional VSM optimizations for Apple Silicon:

- Prepass to clear depth to zero for pages which will not be updated. Further reducing fragment cost.

- Removal of SHADOW_PAGE_CLEAR compute pass for Metal, as this can be handled more efficiently within the geometry update pass.

- Addition of support for F32 shadow atlas for Metal instead of U32, subtly improving shadow update perf.

These changes are not yet productised, still some further work to do on updating descriptions and cleaning up approaches for some of these passes.

Jason Fielder added 1 commit 2023-08-21 23:05:18 +02:00

02ad47d133 Add static compilation

Jason Fielder added 1 commit 2023-08-21 23:06:01 +02:00

32fbb050d6 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 1 commit 2023-08-22 15:30:56 +02:00

a8be1aaf58 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 1 commit 2023-08-24 13:22:28 +02:00

6b6586a1f0 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 2 commits 2023-08-24 14:04:10 +02:00

8c8ca84396 Remove accidentally added layer usage bits.

d98dbddd46 Formatting pass.

Jason Fielder added 2 commits 2023-08-29 13:07:31 +02:00

a398fe8477 Update shader names to be more representative.

6598277ca0 Update decsriptions in shader

Jason Fielder added 1 commit 2023-08-29 14:46:10 +02:00

3a09e1a4d9 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 2 commits 2023-08-30 14:16:05 +02:00

38241a3ae4 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

748d55707a First-pass for refactor generalizing shadow update technique and removing Metal specific guards.

Still need to update guards inside shaders e.g. shadow surface shader and consider permutations
based on technique used instead. This is to ensure toggling the technique works as expected, rather than hitting different branches depending on system.

Jason Fielder added 1 commit 2023-08-30 19:43:18 +02:00

7ef8a4d604 Switching shader material shader and other features depending on chosen shadow update technique.

Jason Fielder added 1 commit 2023-08-30 23:22:54 +02:00

636af6fe1d further refinement of patch with decoupling of technique and atlas format usage.

Although it is not trivial to replace the atlas read routine texture binding depending on format, as the inclusion of the binding is nested fairly deeply.

Jason Fielder added 1 commit 2023-08-31 00:25:52 +02:00

773bdd9227 Further refactoring to separate out write modes for F32 vs U32 atlas for erroneous cases. Also addressing other small bits of feedback, including grid shift OOB/negative modulo issue.

Also add empty pasthrough for imageAtomicMin which performs a store-op to allow successful compilation of the shader

Jason Fielder added 1 commit 2023-08-31 00:33:34 +02:00

dbe3d2b353 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 1 commit 2023-08-31 11:46:33 +02:00

5f3c692b08 Enable the atomic pass to output results on macOS. This display is incorrect due to passthrough texture atomics, however, some form of shadow is visible and testable, and this can be used for the purpose of benchmarking

Jason Fielder added 1 commit 2023-08-31 11:48:41 +02:00

06a5111e58 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 1 commit 2023-08-31 16:23:37 +02:00

9589be295a Move atlas format to a macro. This is the most reasonable way of allowing toggling while ensuring the eevee_shadow_data shader create info is valid for all uses. Also added a safety check to limit usage of the floating point atlas to Metal.

Jason Fielder added 1 commit 2023-08-31 16:24:16 +02:00

6a1446fc59 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 1 commit 2023-08-31 16:32:33 +02:00

8ca4be5fea Lightprobe only needs a single use of BuiltinBits::LAYER

Jason Fielder added 1 commit 2023-08-31 16:32:55 +02:00

e97f9a9072 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Clément Foucault added 2 commits 2023-09-04 20:10:54 +02:00

cf5e4ea9a8 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

# Conflicts:
#	source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tag_update_comp.glsl
#	source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_bounds_comp.glsl
#	source/blender/draw/engines/eevee_next/shaders/eevee_surf_lib.glsl
#	source/blender/draw/engines/eevee_next/shaders/infos/eevee_material_info.hh
#	source/blender/gpu/GPU_texture.h
#	source/blender/gpu/intern/gpu_framebuffer.cc
#	source/blender/gpu/intern/gpu_shader_create_info.cc
#	source/blender/gpu/intern/gpu_shader_create_info.hh
#	source/blender/gpu/metal/mtl_framebuffer.mm
#	source/blender/gpu/shaders/metal/mtl_shader_defines.msl
#	source/blender/python/gpu/gpu_py_shader_create_info.cc

1ff7d33fd2 Make format

Clément Foucault added 1 commit 2023-09-05 10:53:05 +02:00

2b3b86e7d4 Cleanup warning and some code duplication

Jason Fielder added 2 commits 2023-09-18 15:25:08 +02:00

c224dcb8ee Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

c9d1642325 Resolve merge issues.

Jason Fielder added 2 commits 2023-09-19 12:38:42 +02:00

637b97440f GPU: Add platform parameter for GPU architecture

Authored by Apple: Michael Parkin-White

e748f41afb Modify shadow technique selection based on GPU architecture

Michael Parkin-White referenced this pull request

2023-09-20 15:38:18 +02:00

GPU: Add platform parameter for GPU architecture #112566

Jason Fielder added 1 commit 2023-09-20 19:31:42 +02:00

5191342ae5 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Clément Foucault requested changes 2023-09-21 00:46:01 +02:00

source/blender/draw/engines/eevee_next/eevee_pipeline.cc Outdated

						
				@ -159,0 +214,4 @@

				    inst_.sampling.bind_resources(&tbdr_page_store_ps_);

				    inst_.bind_uniform_data(&tbdr_page_store_ps_);

				    tbdr_page_store_ps_.draw_procedural(

				        GPU_PRIM_TRIS, (SHADOW_TILEMAP_RES) * (SHADOW_TILEMAP_RES)*SHADOW_VIEW_MAX, 6);

Clément Foucault commented

2023-09-21 00:44:46 +02:00

SQUARE(SHADOW_TILEMAP_RES). That should avoid confusion of clang format.

`SQUARE(SHADOW_TILEMAP_RES)`. That should avoid confusion of clang format.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/eevee_shadow.hh Outdated

						
				@ -53,2 +55,4 @@

				                                                               {0, 0, 0.5, 1}};

				/* Technique used for updating the virtual shadow map contents. */

				enum eShadowUpdateTechnique {

Clément Foucault commented

2023-09-21 00:13:28 +02:00

Remove prefix e : ShadowUpdateTechnique

Remove prefix `e` : `ShadowUpdateTechnique`

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_lib.glsl Outdated

						
				@ -148,2 +158,4 @@

				}

				#ifdef GPU_METAL

				/* NOTE: Metal can support multiple shadow atlas formats. E.g. U32, F32. */

Clément Foucault commented

2023-09-21 00:16:00 +02:00

Create a define for the type ShadowAtlasSamplerType which changes type. This avoid duplicating all the functions declarations.

Create a define for the type `ShadowAtlasSamplerType` which changes type. This avoid duplicating all the functions declarations.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_page_metal_tbdr_frag.glsl Outdated

				`@ -0,0 +1,64 @@`

Clément Foucault commented

2023-09-21 00:19:17 +02:00

Rename file to eevee_shadow_page_tile_frag/vert. This path will eventually work on all hardware even if not efficient.

Rename file to `eevee_shadow_page_tile_frag/vert`. This path will eventually work on all hardware even if not efficient.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_page_metal_tbdr_frag.glsl Outdated

						
				@ -0,0 +3,4 @@

				 * Virtual Shadow map accumulation storage.

				 *

				 * On Apple silicon, we can use a three-pass method to perform virtual shadow map updates,

				 * leveraging efficient use of tile-based GPUs.Shadow updates rasterize geometry for each view in

Clément Foucault commented

2023-09-21 00:19:42 +02:00

Typo GPUs.Shadow

Typo `GPUs.Shadow`

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_page_metal_tbdr_frag.glsl Outdated

						
				@ -0,0 +57,4 @@

				  shadow_atlas_img.texture->write(u_depth, ushort2(out_texel_xy), out_page_z);

				#  endif

				#  ifdef SHADOW_ATLAS_F32

Clément Foucault commented

2023-09-21 00:23:24 +02:00

About this F32 option, I'm wondering if it wouldn't be good to have a U32 view and a F32 view of the atlas and always use the float one for sampling the shadow map. For rendering we could use the best format depending on the update technique.

Would that have any impact on performance because of texture layout?

About this F32 option, I'm wondering if it wouldn't be good to have a U32 view and a F32 view of the atlas and always use the float one for sampling the shadow map. For rendering we could use the best format depending on the update technique. Would that have any impact on performance because of texture layout?

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_bounds_comp.glsl Outdated

						
				@ -21,6 +21,12 @@ shared int global_max;

				void main()

				{

				  uint index = gl_GlobalInvocationID.x;

Clément Foucault commented

2023-09-21 00:04:23 +02:00

Same block of code is inside the resource_len > 0 block.

Same block of code is inside the `resource_len > 0` block.

fclem marked this conversation as resolved

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_finalize_comp.glsl Outdated

						
				@ -99,3 +99,3 @@

				    /* Issue one view if there is an update in the LOD. */

				    if (all(equal(gl_LocalInvocationID, uvec3(0)))) {

				      bool lod_has_update = rect_min.x < rect_max.x;

				      bool lod_has_update = rect_min_x < rect_max_x;

Clément Foucault commented

2023-09-21 00:02:14 +02:00

Why is this needed? rect_min and rect_max are defined above.

Why is this needed? `rect_min` and `rect_max` are defined above.

fclem marked this conversation as resolved

Jason Fielder added 1 commit 2023-09-25 18:29:11 +02:00

2a0ad15dfe Add texture atomic support.

Jason Fielder added 1 commit 2023-09-25 18:29:49 +02:00

0b29f5e9c3 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 1 commit 2023-09-25 18:31:52 +02:00

d7cc19a639 Undo erroneous platform change.

Jason Fielder added 1 commit 2023-09-25 18:35:54 +02:00

87bff15f5e Add texture usage flags for texture atomics.

Jason Fielder added 1 commit 2023-09-25 18:44:40 +02:00

f896113d67 Guard atomic availability behind OS API support.

Jason Fielder added 1 commit 2023-09-26 00:03:47 +02:00

a625fb3538 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 4 commits 2023-09-26 20:58:53 +02:00

f260db24f1 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

36ef965154 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

6530d9ea61 Address additional PR feedback and remove F32 atlas format switch for now

b494127cf4 Remove template definitions for shadow atlas format as no longer required.

Jason Fielder added 1 commit 2023-09-26 21:14:17 +02:00

2f25bfde46 Remove shader debug name change as not relevant for this patch.

Jason Fielder added 1 commit 2023-09-26 21:25:45 +02:00

4ab1809d56 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 2 commits 2023-09-28 12:11:14 +02:00

37b9fe85a3 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

90f5e7a5e4 GPUArchitectureType change

Michael Parkin-White requested changes 2023-09-28 12:11:32 +02:00

source/blender/draw/intern/draw_command.cc Outdated

						
				@ -665,6 +665,64 @@ void DrawMultiBuf::bind(RecordingState &state,

				  GPU_debug_group_end();

				}

				void DrawMultiBuf::bind_no_comp(RecordingState &state,

Michael Parkin-White commented

2023-09-26 21:12:11 +02:00

First-time contributor

@fclem I had added this function (and its use by the shadows) as a placeholder, although wanted to get your thoughts on how best to handle this scenario.

// Original
DrawMultiBuf::bind

// Version added in this PR to just setup resource bindings without invoking any compute commands
DrawMultiBuf::bind_no_comp

The intent of bind_no_comp is to setup the resource bindings for the subsequent draws, without re-invoking the visibility or draw_command_generate_* compute phase, such that we can execute the three separate passes of the TBDR optimization separately.

Same goes for the other additions in this sub-folder:

Manager::submit_prepare_visibility(PassMain &pass, View &view, command::RecordingState *state)
Manager::submit_pass_only(PassMain &pass, View &view, command::RecordingState *state)

vs

Manager::submit(PassMain &pass, View &view)

and the actual implementation in the shadow engine:

ShadowPipeline::render_prepare_visibility
ShadowPipeline::render_tile_clear
ShadowPipeline::render_main_pass
ShadowPipeline::render_tile_store

vs

ShadowPipeline::render

It feels like there is probably a cleaner way of doing this without introducing any risks from code duplication. Although I had not invested time into this before wanting to get an opinion. One option could be to add a function parameter or template DrawMultiBuf::bind with a boolean switch on generate draw commands.

Then for other functions, could be reasonable to refactor this to eliminate duplication. The one issue I was not super happy with was that the split functions required passing of the command::RecordingState outside of the function, as the latter submission required visibility on this.

Thanks for any notes!

@fclem I had added this function (and its use by the shadows) as a placeholder, although wanted to get your thoughts on how best to handle this scenario. ``` // Original DrawMultiBuf::bind // Version added in this PR to just setup resource bindings without invoking any compute commands DrawMultiBuf::bind_no_comp ``` The intent of `bind_no_comp` is to setup the resource bindings for the subsequent draws, without re-invoking the visibility or draw_command_generate_* compute phase, such that we can execute the three separate passes of the TBDR optimization separately. Same goes for the other additions in this sub-folder: ``` Manager::submit_prepare_visibility(PassMain &pass, View &view, command::RecordingState *state) Manager::submit_pass_only(PassMain &pass, View &view, command::RecordingState *state) vs Manager::submit(PassMain &pass, View &view) ``` and the actual implementation in the shadow engine: ``` ShadowPipeline::render_prepare_visibility ShadowPipeline::render_tile_clear ShadowPipeline::render_main_pass ShadowPipeline::render_tile_store vs ShadowPipeline::render ``` It feels like there is probably a cleaner way of doing this without introducing any risks from code duplication. Although I had not invested time into this before wanting to get an opinion. One option could be to add a function parameter or template `DrawMultiBuf::bind` with a boolean switch on generate draw commands. Then for other functions, could be reasonable to refactor this to eliminate duplication. The one issue I was not super happy with was that the split functions required passing of the `command::RecordingState` outside of the function, as the latter submission required visibility on this. Thanks for any notes!

Clément Foucault commented

2023-09-28 12:44:44 +02:00

For some reason I thought I had written about this in my previous review but my comment got lost somehow.

So I think this is a wrong approach. All of these 3 passes should be inside the same PassMain and you should just use a different draw::PassMain::Sub for each of the clear/render/store passes. That way there shouldn't be a need for any of this duplicated logic.

For some reason I thought I had written about this in my previous review but my comment got lost somehow. So I think this is a wrong approach. All of these 3 passes should be inside the same `PassMain` and you should just use a different `draw::PassMain::Sub` for each of the clear/render/store passes. That way there shouldn't be a need for any of this duplicated logic.

fclem marked this conversation as resolved

Jason Fielder changed title from ~~WIP: Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon~~ to Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon

2023-09-28 12:13:05 +02:00

Clément Foucault requested changes 2023-09-28 12:56:08 +02:00

Clément Foucault left a comment

We are getting closer! 😄

I just merged the GPU_ARCHITECTURE changes to main. Please merge these changes first before we can merge this patch.

We are getting closer! 😄 I just merged the `GPU_ARCHITECTURE` changes to main. Please merge these changes first before we can merge this patch.

source/blender/draw/engines/eevee_next/shaders/eevee_shadow_page_clear_comp.glsl Outdated

						
				@ -18,2 +18,3 @@

				  /* Clear to FLT_MAX instead of 1 so the far plane doesn't cast shadows onto farther objects. */

				/* Clear to FLT_MAX instead of 1 so the far plane doesn't cast shadows onto farther objects. */

				#ifdef GPU_METAL

Clément Foucault commented

2023-09-28 12:48:38 +02:00

No need for the define since it is aliased for GLSL.

fclem marked this conversation as resolved

source/blender/gpu/intern/gpu_shader_create_info.hh Outdated

						
				@ -479,2 +479,4 @@

				  Vector<FragOut> fragment_outputs_;

				  using FragTileIn = FragOut;

				  Vector<FragTileIn> fragment_tile_inputs_;

Clément Foucault commented

2023-09-28 12:55:28 +02:00

these have been replaced by SubpassIn which is the same but with the vulkan denomination. They should just be implemented as ROG on apple silicon. So remove fragment_tile_inputs_.

these have been replaced by `SubpassIn` which is the same but with the vulkan denomination. They should just be implemented as ROG on apple silicon. So remove `fragment_tile_inputs_`.

fclem marked this conversation as resolved

source/blender/gpu/metal/mtl_framebuffer.mm Outdated

						
				@ -130,6 +130,8 @@ void MTLFrameBuffer::bind(bool enabled_srgb)

				    Shader::set_framebuffer_srgb_target(enabled_srgb && srgb_);

				  }

				  this->mark_dirty();

Clément Foucault commented

2023-09-28 12:52:48 +02:00

I guess this was only for testing. Shouldn't it be removed?

fclem marked this conversation as resolved

source/blender/python/gpu/gpu_py_shader_create_info.cc Outdated

						
				@ -22,3 +22,3 @@

				#include "gpu_py_shader.h" /* own include */

				//#define USE_PYGPU_SHADER_INFO_IMAGE_METHOD

				// #define USE_PYGPU_SHADER_INFO_IMAGE_METHOD

Clément Foucault commented

2023-09-28 12:50:33 +02:00

Unrelated change

fclem marked this conversation as resolved

Jason Fielder added 7 commits 2023-09-29 20:59:43 +02:00

d9b70b8ea9 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

f03154c74b Remove old FragTileIn definition

2270efd904 Remove framebuffer dirty flag for temporarily splitting passes.

Note this should be resolved at the high-level.

4494dd0f85 Remove unrelated change

30c724d615 Remove ifdef in shadow_page_clear_comp

e3ce8115ed Remove fragment_tile_in as this was replaced with subpass_in.

ed180f0e2a Refactor TBDR clear and storage passes into subpasses of main shadow surface pass.

For now, shadow tile passes are invoked via an indirect call with the upper bound for tile updates, which are culled by the vertex stage.

Jason Fielder added 1 commit 2023-09-29 21:03:12 +02:00

4f73279050 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

Jason Fielder added 1 commit 2023-09-29 21:37:45 +02:00

6c0348349a Further cleanup to remove redundant changes

Jason Fielder added 1 commit 2023-09-29 21:51:22 +02:00

f6230baca6 Remove formatting change for gpu_shader_Create_info

Clément Foucault added 3 commits 2023-10-01 02:10:20 +02:00

9df7c9dc87 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

13b02a0b77 Make comments and codepath platform agnostic.

6a8db18cd1 Cleanup and use indirect draw buffer with indirection buf

Clément Foucault added 1 commit 2023-10-01 13:37:18 +02:00

964da14627 Cleanup and use depth buffer for atomic raster if present

Clément Foucault added 4 commits 2023-10-01 18:12:53 +02:00

5657d94988 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

8786f83adb More cleanup and fixes

3a928c7254 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

b407681ce0 Last fixes

Clément Foucault approved these changes 2023-10-01 18:51:17 +02:00

Clément Foucault left a comment

I went the extra mile to cleanup and implement all the TODOs.

Now both implementation are supported on both OpenGL and Metal. To avoid too much confusion, I tried to make the names more platform agnostic.

The last quirk that I'm not sure how deal with is the static ShadowModule::shadow_technique which prevents runtime switching between techniques (which could be beneficial in some cases).

Also reducing the number of layer in the framebuffer doesn't seem to work correctly. Some shadow tiles will fail to update and never update. This is something happening in both method and on all platform and not related to this patch.

I'm really happy with the result. Both in terms of performance and in code quality.

I would just like for Michael to glance over my changes and to see if they all make sense.

Otherwise I think it's good to merge.

I went the extra mile to cleanup and implement all the TODOs. Now both implementation are supported on both OpenGL and Metal. To avoid too much confusion, I tried to make the names more platform agnostic. The last quirk that I'm not sure how deal with is the static `ShadowModule::shadow_technique` which prevents runtime switching between techniques (which could be beneficial in some cases). Also reducing the number of layer in the framebuffer doesn't seem to work correctly. Some shadow tiles will fail to update and never update. This is something happening in both method and on all platform and not related to this patch. I'm really happy with the result. Both in terms of performance and in code quality. I would just like for Michael to glance over my changes and to see if they all make sense. Otherwise I think it's good to merge.

Clément Foucault added 11 commits 2023-10-05 16:35:45 +02:00

05492f4d66 Add more debugging defines

33a8248f84 Add subpass transition at the start of the shadow pass

and update documentation of subpasses

6c989a8e07 Fix issue with subpass attachment not clearing

23cfcacf5f Fix shadow_page_pack functions and add asserts

dac3075c74 Change tile indirect call to not be instance based for better debug ability

f994cb5b4a Change excessive check

569371ce04 Final cleanup

4fb0e20739 Revert to using another packing function for `src_coord_buf`

63921e0607 Fix gl subpass again

dedee835d5 Fix shadow test

498a1eed5e Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr

# Conflicts:
#	source/blender/draw/engines/eevee_next/shaders/infos/eevee_material_info.hh

Clément Foucault added 2 commits 2023-10-05 17:04:31 +02:00

2732dc05a8 Fix broken tests

buildbot/vexp-code-patch-coordinator Build done. Details

474285b96f Unset debug resultion

Clément Foucault commented

2023-10-05 17:14:29 +02:00

@blender-bot build

Michael Parkin-White approved these changes 2023-10-05 19:01:04 +02:00

Michael Parkin-White left a comment

First-time contributor

Have run the latest changes through a number of scenes and both paths appear to be functioning correctly. Performance is about where it should be, with some improvements due to the cleanup for indirect tile dispatch.

I think this is now in a good state for the initial merge. The implementation features discussed such as compute perf tuning and floating point shadow atlas are things we can address in later patches.

LGTM.

Have run the latest changes through a number of scenes and both paths appear to be functioning correctly. Performance is about where it should be, with some improvements due to the cleanup for indirect tile dispatch. I think this is now in a good state for the initial merge. The implementation features discussed such as compute perf tuning and floating point shadow atlas are things we can address in later patches. LGTM.

Clément Foucault merged commit 57a3ab29cc into main

2023-10-05 19:02:47 +02:00

Clément Foucault referenced this issue from a commit

2023-10-05 19:02:49 +02:00

Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon

Sign in to join this conversation.

No reviewers