Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon #111283

Merged
Clément Foucault merged 77 commits from Jason-Fielder/blender:eevee_next_metal_shadow_opti_tbdr into main 2023-10-05 19:02:47 +02:00
Member

WIP PR for optimization of EEVEE Next's Virtual Shadow Maps for
TBDRs. The core of these optimizations lie in eliminating use of
atomic shadow atlas writes and instead utilise tile memory to
perform depth accumulation as a secondary pass once all
geometry updates for a given shadow view have been updated.

This also allows use of fast on-tile depth testing/sorting, reducing
overdraw and redundant fragment operations, while also allowing
for tile indirection calculations to be offloaded into the vertex
shader to increase fragment storage efficiency and throughput.

Authored by Apple: Michael Parkin-White


Further work required:

  • Optimization: Consider replacing U32 shadow atlas with F32.

  • Add pass before shadow geometry raster to ensure only tiles with active geometry can raster.
    - Initially clear all other tiles to zero instead.
    - Use same quad generation shader to clear active update regions to 1.0 allowing depth contents to be rendered.

  • Determine if there is a way to reduce layer usage in framebuffer or accumulation quad geometry dispatch in order to reduce intermediate memory pressure of large virtual tile bin. (To be resolved in follow-up patch).

  • Benchmark other phases in shadow gen. Optimize workgroup sizes for each sub-phase where appropriate.

  • Remove redundant bindings for surface shadow shader (e.g. texture atlas, tile info bufs etc) as subtle static resource cost.

  • Remove SHADOW_PAGE_CLEAR pass in favour of storing cleared values within accumulation pass.

Other potential work:

  • Replace accumulation phase quad generation with GPU-driven solution (likely unnecessary as the vertex phase of the accumulation pass is not very expensive).

WIP PR for optimization of EEVEE Next's Virtual Shadow Maps for TBDRs. The core of these optimizations lie in eliminating use of atomic shadow atlas writes and instead utilise tile memory to perform depth accumulation as a secondary pass once all geometry updates for a given shadow view have been updated. This also allows use of fast on-tile depth testing/sorting, reducing overdraw and redundant fragment operations, while also allowing for tile indirection calculations to be offloaded into the vertex shader to increase fragment storage efficiency and throughput. Authored by Apple: Michael Parkin-White ---------------------- Further work required: - [x] Optimization: Consider replacing U32 shadow atlas with F32. - [x] Add pass before shadow geometry raster to ensure only tiles with active geometry can raster. - Initially clear all other tiles to zero instead. - Use same quad generation shader to clear active update regions to 1.0 allowing depth contents to be rendered. - [x] Determine if there is a way to reduce layer usage in framebuffer or accumulation quad geometry dispatch in order to reduce intermediate memory pressure of large virtual tile bin. (To be resolved in follow-up patch). - [x] Benchmark other phases in shadow gen. Optimize workgroup sizes for each sub-phase where appropriate. - [x] Remove redundant bindings for surface shadow shader (e.g. texture atlas, tile info bufs etc) as subtle static resource cost. - [x] Remove SHADOW_PAGE_CLEAR pass in favour of storing cleared values within accumulation pass. Other potential work: - [x] Replace accumulation phase quad generation with GPU-driven solution (likely unnecessary as the vertex phase of the accumulation pass is not very expensive). -----------------------
Jason Fielder added 1 commit 2023-08-18 22:29:55 +02:00
f4827c8ff2 Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon
Opening WIP PR for optimization and implementation of EEVEE Next's
Virtual Shadow Maps for TBDRs. The core of these optimizations lie in
eliminating use of atomic shadow atlas writes and instead utilise tile
memory to perform depth accumulation as a secondary pass once
all geometry updates for a given shadow view have been updated.

This also allows use of fast on-tile depth testing/sorting, reducing
overdraw and redundant fragment operations, while also allowing
for tile indirection calculations to be offloaded into the vertex
shader to increase fragment storage efficiency and throughput.

Authored by Apple: Michael Parkin-White
Jason Fielder requested review from Clément Foucault 2023-08-18 22:30:46 +02:00
Clément Foucault requested changes 2023-08-19 16:13:26 +02:00
Clément Foucault left a comment
Member

I would say it is better to split this patch into :

  • Raster order group.
  • Memoryless texture.
  • Load Store changes.
  • Out of bounds fixes for the shadow pipeline.
  • The actual shadow implementation.

This way it is easier to discuss about each separate topic. Some are easier to accept than others.

But I left already a few comment about code organization.

Seems there is some formatting differences on mtl_shader_defines.msl and on some other files.

You can format using make format in the blender source folder (the git repo root folder) so that it uses the clang-format from the lib folder.

I would say it is better to split this patch into : - Raster order group. - Memoryless texture. - Load Store changes. - Out of bounds fixes for the shadow pipeline. - The actual shadow implementation. This way it is easier to discuss about each separate topic. Some are easier to accept than others. But I left already a few comment about code organization. Seems there is some formatting differences on `mtl_shader_defines.msl` and on some other files. You can format using `make format` in the `blender` source folder (the git repo root folder) so that it uses the clang-format from the lib folder.
@ -573,6 +573,8 @@ set(GLSL_SRC
engines/eevee_next/shaders/eevee_surf_forward_frag.glsl
engines/eevee_next/shaders/eevee_surf_lib.glsl
engines/eevee_next/shaders/eevee_surf_shadow_frag.glsl
engines/eevee_next/shaders/eevee_surf_shadow_accum_vert.glsl

This should be eevee_shadow_page_write_frag/vert.glsl.
The eevee_surf_* are fragment shaders that can match the eevee_geom_* vertex shaders.

This should be `eevee_shadow_page_write_frag/vert.glsl`. The `eevee_surf_*` are fragment shaders that can match the `eevee_geom_*` vertex shaders.
fclem marked this conversation as resolved
@ -160,0 +165,4 @@
#ifdef WITH_METAL_BACKEND
/* Perform accumulation step for storing final tile depth results back to shadow atlas. */
accum_ps_.init();

This is not an accumulation pass. It just copies the result to the atlas. It should be renamed page_write_ps and moved to the shadow module instead of being in the pipeline.

Also no need for the Camera UBO, the utility texture etc... verify all resources are needed.

This is not an accumulation pass. It just copies the result to the atlas. It should be renamed `page_write_ps` and moved to the shadow module instead of being in the pipeline. Also no need for the Camera UBO, the utility texture etc... verify all resources are needed.
fclem marked this conversation as resolved
@ -102,6 +102,9 @@ class ShadowPipeline {
Instance &inst_;
PassMain surface_ps_ = {"Shadow.Surface"};
#ifdef WITH_METAL_BACKEND

I wouldn't guard any of this behind the WITH_METAL_BACKEND as it would be easy to break.
I would even go to the extend to support an emulated path for other platform for checking this feature works.
Maybe the only thing that need ifdef is the shader for now.

Making it clear it is a separate technique in a subsection of the ShadowModule class would be good.

I wouldn't guard any of this behind the `WITH_METAL_BACKEND` as it would be easy to break. I would even go to the extend to support an emulated path for other platform for checking this feature works. Maybe the only thing that need `ifdef` is the shader for now. Making it clear it is a separate technique in a subsection of the ShadowModule class would be good.
fclem marked this conversation as resolved
@ -1175,2 +1201,4 @@
shadow_multi_view_.compute_procedural_bounds();
#ifdef WITH_METAL_BACKEND
/* Specify load-store config right before render to ensure flags are retained. */

From an API perspective, wouldn't it be better to have this stored as a state of the framebuffer?

From an API perspective, wouldn't it be better to have this stored as a state of the framebuffer?
fclem marked this conversation as resolved
@ -259,2 +259,3 @@
/** Framebuffer with the atlas_tx attached. */
Framebuffer render_fb_;
#ifdef WITH_METAL_BACKEND
Framebuffer render_fb_ = {"shadow_write_framebuffer"};

This shouldn't be in the #ifdef block.

This shouldn't be in the `#ifdef` block.
fclem marked this conversation as resolved
@ -12,3 +12,3 @@
#pragma BLENDER_REQUIRE(eevee_shadow_tilemap_lib.glsl)
shared bool directional_range_changed;
shared uint directional_range_changed;

I did this change a few days ago. I did not remember this was forbidden. I now have ammended our GLSL style guide to include this.

I did this change a few days ago. I did not remember this was forbidden. I now have ammended our GLSL style guide to include this.
fclem marked this conversation as resolved
@ -69,3 +69,3 @@
ivec2 tile_shifted = tile_co + tilemap.grid_shift;
/* Ensure value is shifted into positive range to avoid modulo on negative. */
ivec2 tile_wrapped = ivec2((ivec2(SHADOW_TILEMAP_RES) + tile_shifted) % SHADOW_TILEMAP_RES);
ivec2 tile_wrapped = ivec2((ivec2(SHADOW_TILEMAP_RES * SHADOW_TILEMAP_PER_ROW) + tile_shifted) %

This is unlikely to be correct and misleading. It might be prefereable to clamp tilemap.grid_shift to [-SHADOW_TILEMAP_RES, SHADOW_TILEMAP_RES] range to compute tile_shifted. Higher values don't matter.

This is unlikely to be correct and misleading. It might be prefereable to clamp `tilemap.grid_shift` to `[-SHADOW_TILEMAP_RES, SHADOW_TILEMAP_RES]` range to compute `tile_shifted`. Higher values don't matter.
fclem marked this conversation as resolved
@ -0,0 +74,4 @@
/** Interpolate output texel */
/* Using bitwise ops is way faster than integer ops. */
const int page_shift = SHADOW_PAGE_LOD;

Unused

Unused
fclem marked this conversation as resolved
@ -0,0 +94,4 @@
/* NOTE: To avoid redundant writes, we still enable the depth test and configure the accumulation
* pass quad depth such that only the fragments updated during the surface depth pass will run.
*
* We use `DRW_STATE_DEPTH_GREATER_EQUAL` such that we only run the fragment shader if the

I guess you can do that because you kept the clear dispatch. I'm pretty sure it's more efficient to just copy the whole tile and not clear the atlas.

I guess you can do that because you kept the clear dispatch. I'm pretty sure it's more efficient to just copy the whole tile and not clear the atlas.
First-time contributor

Yeah, following the later update, clearing the full tile and writing this out in full and removing the compute-based clear pass is indeed quite a bit faster.

In the updated version, I've added a comment distinguishing the two options, as it may be important to have contextual information if ever anything needs changing, i.e. knowing that if the compute clear pass ever needs to be used for whatever reason, then the depth flag can be changed back to skip writes that are not needed.

Yeah, following the later update, clearing the full tile and writing this out in full and removing the compute-based clear pass is indeed quite a bit faster. In the updated version, I've added a comment distinguishing the two options, as it may be important to have contextual information if ever anything needs changing, i.e. knowing that if the compute clear pass ever needs to be used for whatever reason, then the depth flag can be changed back to skip writes that are not needed.
fclem marked this conversation as resolved
Jason Fielder added 2 commits 2023-08-21 20:42:58 +02:00
8bdee8bdab Additional VSM optimizations for Apple Silicon:
- Prepass to clear depth to zero for pages which will not be updated. Further reducing fragment cost.

- Removal of SHADOW_PAGE_CLEAR compute pass for Metal, as this can be handled more efficiently within the geometry update pass.

- Addition of support for F32 shadow atlas for Metal instead of U32, subtly improving shadow update perf.

These changes are not yet productised, still some further work to do on updating descriptions and cleaning up approaches for some of these passes.
Jason Fielder added 1 commit 2023-08-21 23:05:18 +02:00
Jason Fielder added 1 commit 2023-08-21 23:06:01 +02:00
Jason Fielder added 1 commit 2023-08-22 15:30:56 +02:00
Jason Fielder added 1 commit 2023-08-24 13:22:28 +02:00
Jason Fielder added 2 commits 2023-08-24 14:04:10 +02:00
Jason Fielder added 2 commits 2023-08-29 13:07:31 +02:00
Jason Fielder added 1 commit 2023-08-29 14:46:10 +02:00
Jason Fielder added 2 commits 2023-08-30 14:16:05 +02:00
748d55707a First-pass for refactor generalizing shadow update technique and removing Metal specific guards.
Still need to update guards inside shaders e.g. shadow surface shader and consider permutations
based on technique used instead. This is to ensure toggling the technique works as expected, rather than hitting different branches depending on system.
Jason Fielder added 1 commit 2023-08-30 19:43:18 +02:00
Jason Fielder added 1 commit 2023-08-30 23:22:54 +02:00
636af6fe1d further refinement of patch with decoupling of technique and atlas format usage.
Although it is not trivial to replace the atlas read routine texture binding depending on format, as the inclusion of the binding is nested fairly deeply.
Jason Fielder added 1 commit 2023-08-31 00:25:52 +02:00
Jason Fielder added 1 commit 2023-08-31 00:33:34 +02:00
Jason Fielder added 1 commit 2023-08-31 11:46:33 +02:00
Jason Fielder added 1 commit 2023-08-31 11:48:41 +02:00
Jason Fielder added 1 commit 2023-08-31 16:23:37 +02:00
Jason Fielder added 1 commit 2023-08-31 16:24:16 +02:00
Jason Fielder added 1 commit 2023-08-31 16:32:33 +02:00
Jason Fielder added 1 commit 2023-08-31 16:32:55 +02:00
Clément Foucault added 2 commits 2023-09-04 20:10:54 +02:00
cf5e4ea9a8 Merge branch 'main' into eevee_next_metal_shadow_opti_tbdr
# Conflicts:
#	source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tag_update_comp.glsl
#	source/blender/draw/engines/eevee_next/shaders/eevee_shadow_tilemap_bounds_comp.glsl
#	source/blender/draw/engines/eevee_next/shaders/eevee_surf_lib.glsl
#	source/blender/draw/engines/eevee_next/shaders/infos/eevee_material_info.hh
#	source/blender/gpu/GPU_texture.h
#	source/blender/gpu/intern/gpu_framebuffer.cc
#	source/blender/gpu/intern/gpu_shader_create_info.cc
#	source/blender/gpu/intern/gpu_shader_create_info.hh
#	source/blender/gpu/metal/mtl_framebuffer.mm
#	source/blender/gpu/shaders/metal/mtl_shader_defines.msl
#	source/blender/python/gpu/gpu_py_shader_create_info.cc
Clément Foucault added 1 commit 2023-09-05 10:53:05 +02:00
Jason Fielder added 2 commits 2023-09-18 15:25:08 +02:00
Jason Fielder added 2 commits 2023-09-19 12:38:42 +02:00
Jason Fielder added 1 commit 2023-09-20 19:31:42 +02:00
Clément Foucault requested changes 2023-09-21 00:46:01 +02:00
@ -159,0 +214,4 @@
inst_.sampling.bind_resources(&tbdr_page_store_ps_);
inst_.bind_uniform_data(&tbdr_page_store_ps_);
tbdr_page_store_ps_.draw_procedural(
GPU_PRIM_TRIS, (SHADOW_TILEMAP_RES) * (SHADOW_TILEMAP_RES)*SHADOW_VIEW_MAX, 6);

SQUARE(SHADOW_TILEMAP_RES). That should avoid confusion of clang format.

`SQUARE(SHADOW_TILEMAP_RES)`. That should avoid confusion of clang format.
fclem marked this conversation as resolved
@ -53,2 +55,4 @@
{0, 0, 0.5, 1}};
/* Technique used for updating the virtual shadow map contents. */
enum eShadowUpdateTechnique {

Remove prefix e : ShadowUpdateTechnique

Remove prefix `e` : `ShadowUpdateTechnique`
fclem marked this conversation as resolved
@ -148,2 +158,4 @@
}
#ifdef GPU_METAL
/* NOTE: Metal can support multiple shadow atlas formats. E.g. U32, F32. */

Create a define for the type ShadowAtlasSamplerType which changes type. This avoid duplicating all the functions declarations.

Create a define for the type `ShadowAtlasSamplerType` which changes type. This avoid duplicating all the functions declarations.
fclem marked this conversation as resolved
@ -0,0 +1,64 @@

Rename file to eevee_shadow_page_tile_frag/vert. This path will eventually work on all hardware even if not efficient.

Rename file to `eevee_shadow_page_tile_frag/vert`. This path will eventually work on all hardware even if not efficient.
fclem marked this conversation as resolved
@ -0,0 +3,4 @@
* Virtual Shadow map accumulation storage.
*
* On Apple silicon, we can use a three-pass method to perform virtual shadow map updates,
* leveraging efficient use of tile-based GPUs.Shadow updates rasterize geometry for each view in

Typo GPUs.Shadow

Typo `GPUs.Shadow`
fclem marked this conversation as resolved
@ -0,0 +57,4 @@
shadow_atlas_img.texture->write(u_depth, ushort2(out_texel_xy), out_page_z);
# endif
# ifdef SHADOW_ATLAS_F32

About this F32 option, I'm wondering if it wouldn't be good to have a U32 view and a F32 view of the atlas and always use the float one for sampling the shadow map. For rendering we could use the best format depending on the update technique.

Would that have any impact on performance because of texture layout?

About this F32 option, I'm wondering if it wouldn't be good to have a U32 view and a F32 view of the atlas and always use the float one for sampling the shadow map. For rendering we could use the best format depending on the update technique. Would that have any impact on performance because of texture layout?
fclem marked this conversation as resolved
@ -21,6 +21,12 @@ shared int global_max;
void main()
{
uint index = gl_GlobalInvocationID.x;

Same block of code is inside the resource_len > 0 block.

Same block of code is inside the `resource_len > 0` block.
fclem marked this conversation as resolved
@ -99,3 +99,3 @@
/* Issue one view if there is an update in the LOD. */
if (all(equal(gl_LocalInvocationID, uvec3(0)))) {
bool lod_has_update = rect_min.x < rect_max.x;
bool lod_has_update = rect_min_x < rect_max_x;

Why is this needed? rect_min and rect_max are defined above.

Why is this needed? `rect_min` and `rect_max` are defined above.
fclem marked this conversation as resolved
Jason Fielder added 1 commit 2023-09-25 18:29:11 +02:00
Jason Fielder added 1 commit 2023-09-25 18:29:49 +02:00
Jason Fielder added 1 commit 2023-09-25 18:31:52 +02:00
Jason Fielder added 1 commit 2023-09-25 18:35:54 +02:00
Jason Fielder added 1 commit 2023-09-25 18:44:40 +02:00
Jason Fielder added 1 commit 2023-09-26 00:03:47 +02:00
Jason Fielder added 4 commits 2023-09-26 20:58:53 +02:00
Jason Fielder added 1 commit 2023-09-26 21:14:17 +02:00
Jason Fielder added 1 commit 2023-09-26 21:25:45 +02:00
Jason Fielder added 2 commits 2023-09-28 12:11:14 +02:00
Michael Parkin-White requested changes 2023-09-28 12:11:32 +02:00
@ -665,6 +665,64 @@ void DrawMultiBuf::bind(RecordingState &state,
GPU_debug_group_end();
}
void DrawMultiBuf::bind_no_comp(RecordingState &state,
First-time contributor

@fclem I had added this function (and its use by the shadows) as a placeholder, although wanted to get your thoughts on how best to handle this scenario.

// Original
DrawMultiBuf::bind

// Version added in this PR to just setup resource bindings without invoking any compute commands
DrawMultiBuf::bind_no_comp

The intent of bind_no_comp is to setup the resource bindings for the subsequent draws, without re-invoking the visibility or draw_command_generate_* compute phase, such that we can execute the three separate passes of the TBDR optimization separately.

Same goes for the other additions in this sub-folder:

Manager::submit_prepare_visibility(PassMain &pass, View &view, command::RecordingState *state)
Manager::submit_pass_only(PassMain &pass, View &view, command::RecordingState *state)

vs

Manager::submit(PassMain &pass, View &view)

and the actual implementation in the shadow engine:

ShadowPipeline::render_prepare_visibility
ShadowPipeline::render_tile_clear
ShadowPipeline::render_main_pass
ShadowPipeline::render_tile_store

vs

ShadowPipeline::render

It feels like there is probably a cleaner way of doing this without introducing any risks from code duplication. Although I had not invested time into this before wanting to get an opinion. One option could be to add a function parameter or template DrawMultiBuf::bind with a boolean switch on generate draw commands.

Then for other functions, could be reasonable to refactor this to eliminate duplication. The one issue I was not super happy with was that the split functions required passing of the command::RecordingState outside of the function, as the latter submission required visibility on this.

Thanks for any notes!

@fclem I had added this function (and its use by the shadows) as a placeholder, although wanted to get your thoughts on how best to handle this scenario. ``` // Original DrawMultiBuf::bind // Version added in this PR to just setup resource bindings without invoking any compute commands DrawMultiBuf::bind_no_comp ``` The intent of `bind_no_comp` is to setup the resource bindings for the subsequent draws, without re-invoking the visibility or draw_command_generate_* compute phase, such that we can execute the three separate passes of the TBDR optimization separately. Same goes for the other additions in this sub-folder: ``` Manager::submit_prepare_visibility(PassMain &pass, View &view, command::RecordingState *state) Manager::submit_pass_only(PassMain &pass, View &view, command::RecordingState *state) vs Manager::submit(PassMain &pass, View &view) ``` and the actual implementation in the shadow engine: ``` ShadowPipeline::render_prepare_visibility ShadowPipeline::render_tile_clear ShadowPipeline::render_main_pass ShadowPipeline::render_tile_store vs ShadowPipeline::render ``` It feels like there is probably a cleaner way of doing this without introducing any risks from code duplication. Although I had not invested time into this before wanting to get an opinion. One option could be to add a function parameter or template `DrawMultiBuf::bind` with a boolean switch on generate draw commands. Then for other functions, could be reasonable to refactor this to eliminate duplication. The one issue I was not super happy with was that the split functions required passing of the `command::RecordingState` outside of the function, as the latter submission required visibility on this. Thanks for any notes!

For some reason I thought I had written about this in my previous review but my comment got lost somehow.

So I think this is a wrong approach. All of these 3 passes should be inside the same PassMain and you should just use a different draw::PassMain::Sub for each of the clear/render/store passes. That way there shouldn't be a need for any of this duplicated logic.

For some reason I thought I had written about this in my previous review but my comment got lost somehow. So I think this is a wrong approach. All of these 3 passes should be inside the same `PassMain` and you should just use a different `draw::PassMain::Sub` for each of the clear/render/store passes. That way there shouldn't be a need for any of this duplicated logic.
fclem marked this conversation as resolved
Jason Fielder changed title from WIP: Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon to Metal: EEVEE Next: Optimize Virtual shadow maps for Apple Silicon 2023-09-28 12:13:05 +02:00
Clément Foucault requested changes 2023-09-28 12:56:08 +02:00
Clément Foucault left a comment
Member

We are getting closer! 😄

I just merged the GPU_ARCHITECTURE changes to main. Please merge these changes first before we can merge this patch.

We are getting closer! 😄 I just merged the `GPU_ARCHITECTURE` changes to main. Please merge these changes first before we can merge this patch.
@ -18,2 +18,3 @@
/* Clear to FLT_MAX instead of 1 so the far plane doesn't cast shadows onto farther objects. */
/* Clear to FLT_MAX instead of 1 so the far plane doesn't cast shadows onto farther objects. */
#ifdef GPU_METAL

No need for the define since it is aliased for GLSL.

No need for the define since it is aliased for GLSL.
fclem marked this conversation as resolved
@ -479,2 +479,4 @@
Vector<FragOut> fragment_outputs_;
using FragTileIn = FragOut;
Vector<FragTileIn> fragment_tile_inputs_;

these have been replaced by SubpassIn which is the same but with the vulkan denomination. They should just be implemented as ROG on apple silicon. So remove fragment_tile_inputs_.

these have been replaced by `SubpassIn` which is the same but with the vulkan denomination. They should just be implemented as ROG on apple silicon. So remove `fragment_tile_inputs_`.
fclem marked this conversation as resolved
@ -130,6 +130,8 @@ void MTLFrameBuffer::bind(bool enabled_srgb)
Shader::set_framebuffer_srgb_target(enabled_srgb && srgb_);
}
this->mark_dirty();

I guess this was only for testing. Shouldn't it be removed?

I guess this was only for testing. Shouldn't it be removed?
fclem marked this conversation as resolved
@ -22,3 +22,3 @@
#include "gpu_py_shader.h" /* own include */
//#define USE_PYGPU_SHADER_INFO_IMAGE_METHOD
// #define USE_PYGPU_SHADER_INFO_IMAGE_METHOD

Unrelated change

Unrelated change
fclem marked this conversation as resolved
Jason Fielder added 7 commits 2023-09-29 20:59:43 +02:00
Jason Fielder added 1 commit 2023-09-29 21:03:12 +02:00
Jason Fielder added 1 commit 2023-09-29 21:37:45 +02:00
Jason Fielder added 1 commit 2023-09-29 21:51:22 +02:00
Clément Foucault added 3 commits 2023-10-01 02:10:20 +02:00
Clément Foucault added 1 commit 2023-10-01 13:37:18 +02:00
Clément Foucault added 4 commits 2023-10-01 18:12:53 +02:00
Clément Foucault approved these changes 2023-10-01 18:51:17 +02:00
Clément Foucault left a comment
Member

I went the extra mile to cleanup and implement all the TODOs.

Now both implementation are supported on both OpenGL and Metal. To avoid too much confusion, I tried to make the names more platform agnostic.

The last quirk that I'm not sure how deal with is the static ShadowModule::shadow_technique which prevents runtime switching between techniques (which could be beneficial in some cases).

Also reducing the number of layer in the framebuffer doesn't seem to work correctly. Some shadow tiles will fail to update and never update. This is something happening in both method and on all platform and not related to this patch.

I'm really happy with the result. Both in terms of performance and in code quality.

I would just like for Michael to glance over my changes and to see if they all make sense.

Otherwise I think it's good to merge.

I went the extra mile to cleanup and implement all the TODOs. Now both implementation are supported on both OpenGL and Metal. To avoid too much confusion, I tried to make the names more platform agnostic. The last quirk that I'm not sure how deal with is the static `ShadowModule::shadow_technique` which prevents runtime switching between techniques (which could be beneficial in some cases). Also reducing the number of layer in the framebuffer doesn't seem to work correctly. Some shadow tiles will fail to update and never update. This is something happening in both method and on all platform and not related to this patch. I'm really happy with the result. Both in terms of performance and in code quality. I would just like for Michael to glance over my changes and to see if they all make sense. Otherwise I think it's good to merge.
Clément Foucault added 11 commits 2023-10-05 16:35:45 +02:00
Clément Foucault added 2 commits 2023-10-05 17:04:31 +02:00
buildbot/vexp-code-patch-coordinator Build done. Details
474285b96f
Unset debug resultion

@blender-bot build

@blender-bot build
Michael Parkin-White approved these changes 2023-10-05 19:01:04 +02:00
Michael Parkin-White left a comment
First-time contributor

Have run the latest changes through a number of scenes and both paths appear to be functioning correctly. Performance is about where it should be, with some improvements due to the cleanup for indirect tile dispatch.

I think this is now in a good state for the initial merge. The implementation features discussed such as compute perf tuning and floating point shadow atlas are things we can address in later patches.

LGTM.

Have run the latest changes through a number of scenes and both paths appear to be functioning correctly. Performance is about where it should be, with some improvements due to the cleanup for indirect tile dispatch. I think this is now in a good state for the initial merge. The implementation features discussed such as compute perf tuning and floating point shadow atlas are things we can address in later patches. LGTM.
Clément Foucault merged commit 57a3ab29cc into main 2023-10-05 19:02:47 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#111283
No description provided.