Eevee: GPU Material node graph optimization. #104536

Jason Fielder · 2023-02-09T19:20:53+01:00

Jason Fielder commented

2023-02-09 19:20:53 +01:00

Certain material node graphs can be very expensive to run. This feature aims to produce secondary GPUPass shaders within a GPUMaterial which provide optimal runtime performance. Such optimizations include baking constant data into the shader source directly, allowing the compiler to propogate constants and perform aggressive optimization upfront.

As optimizations can result in reduction of shader editor and animation interactivity, optimized pass generation and compilation is deferred until all outstanding compilations have completed. Optimization is also delayed util a material has remained unmodified for a set period of time, to reduce excessive compilation. The original variant of the material shader is kept to maintain interactivity.

Also adding a new concept to gpu::Shader allowing assignment of a parent shader from which a shader can pull PSO descriptors and any required metadata for asynchronous shader cache warming. This enables fully asynchronous shader optimization, without runtime hitching, while also reducing runtime hitching for standard materials, by using PSO descriptors from default materials, ahead of rendering.

Further shader graph optimizations are likely also possible with this architecture. Certain scenes, such as Wanderer benefit significantly. Viewport performance for this scene is 2-3x faster on Apple-silicon based GPUs.

Authored by Apple: Michael Parkin-White

Ref T96261

Certain material node graphs can be very expensive to run. This feature aims to produce secondary GPUPass shaders within a GPUMaterial which provide optimal runtime performance. Such optimizations include baking constant data into the shader source directly, allowing the compiler to propogate constants and perform aggressive optimization upfront. As optimizations can result in reduction of shader editor and animation interactivity, optimized pass generation and compilation is deferred until all outstanding compilations have completed. Optimization is also delayed util a material has remained unmodified for a set period of time, to reduce excessive compilation. The original variant of the material shader is kept to maintain interactivity. Also adding a new concept to gpu::Shader allowing assignment of a parent shader from which a shader can pull PSO descriptors and any required metadata for asynchronous shader cache warming. This enables fully asynchronous shader optimization, without runtime hitching, while also reducing runtime hitching for standard materials, by using PSO descriptors from default materials, ahead of rendering. Further shader graph optimizations are likely also possible with this architecture. Certain scenes, such as Wanderer benefit significantly. Viewport performance for this scene is 2-3x faster on Apple-silicon based GPUs. Authored by Apple: Michael Parkin-White Ref T96261

❤️ 1 🚀 2

Jason Fielder added 1 commit 2023-02-09 19:20:54 +01:00

Eevee: GPU Material node graph optimization. 2e9a015986

Certain material node graphs can be very expensive to run. This feature aims to produce secondary GPUPass shaders within a GPUMaterial which provide optimal runtime performance. Such optimizations include baking constant data into the shader source directly, allowing the compiler to propogate constants and perform aggressive optimization upfront.

As optimizations can result in reduction of shader editor and animation interactivity, optimized pass generation and compilation is deferred until all outstanding compilations have completed. Optimization is also delayed util a material has remained unmodified for a set period of time, to reduce excessive compilation. The original variant of the material shader is kept to maintain interactivity.

Also adding a new concept to gpu::Shader allowing assignment of a parent shader from which a shader can pull PSO descriptors and any required metadata for asynchronous shader cache warming. This enables fully asynchronous shader optimization, without runtime hitching, while also reducing runtime hitching for standard materials, by using PSO descriptors from default materials, ahead of rendering.

Further shader graph optimizations are likely also possible with this architecture. Certain scenes, such as Wanderer benefit significantly. Viewport performance for this scene is 2-3x faster on Apple-silicon based GPUs.

Authored by Apple: Michael Parkin-White

Ref T96261

Jason Fielder requested review from Clément Foucault 2023-02-09 19:22:16 +01:00

Brecht Van Lommel added this to the Viewport & EEVEE project 2023-02-13 09:13:00 +01:00

Clément Foucault reviewed 2023-02-13 09:13:44 +01:00

source/blender/draw/engines/eevee/eevee_shaders.cc

						
				@ -1394,2 +1393,2 @@

				      break;

				    case GPU_MAT_QUEUED:

				    case GPU_MAT_SUCCESS: {

				      /* Detemrine optimization status. */

Clément Foucault commented

2023-02-13 09:13:43 +01:00

typo

fclem marked this conversation as resolved

Clément Foucault approved these changes 2023-02-13 22:01:37 +01:00

Clément Foucault left a comment

I think the patch is clear and usage is all good.

I just would like to avoid printing stuff like Async compilation complete. Begin PSO warm USING DEFAULT MATERIAL. even for opengl, when it is clearly disabled.

I think the patch is clear and usage is all good. I just would like to avoid printing stuff like `Async compilation complete. Begin PSO warm USING DEFAULT MATERIAL.` even for opengl, when it is clearly disabled.

source/blender/draw/engines/eevee/eevee_shaders.cc

						
				@ -1397,2 +1402,2 @@

				      mat = EEVEE_material_default_get(scene, ma, options);

				      break;

				      GPUMaterial *default_mat = EEVEE_material_default_get(scene, ma, options);

				      /* Mark pending material with its default material for future cache warming.*/

Clément Foucault commented

2023-02-13 10:04:32 +01:00

How does that work? Material can have different resources based on their flag (see EEVEE_material_bind_resources(à). Unless this is only for the vertex inputs and fragment outputs, these should be the same.

How does that work? Material **can** have different resources based on their flag (see `EEVEE_material_bind_resources(à`). Unless this is only for the vertex inputs and fragment outputs, these should be the same.

Michael Parkin-White commented

2023-02-14 12:06:53 +01:00

First-time contributor

The match rate of PSOs from the default material's to the regular materials is not 100%, however, from outputting the PSO descriptor hashes, there were matches with the corresponding default material more often than there were not.

Though yes, when resource inputs change their structure (e.g. differing geometry layout), these matches do not always line up.

But as you mention, for these cases, the PSO descriptor only requires vertex input data structure and fragment output format, along with a few other state properties such as blending, colour channel masking etc; however these usually appear to be consistent for a given material type in EEVEE, as those are generally dependent on the type of pass, more so than the material being rendered(?).

For Metal specifically, most state is dynamic, e.g. resource bindings, viewport/scissor, depth-stencil etc; so these parameters do not need to be part of the PSO descriptors.

I limited the cache warming for EEVEE materials from default materials to only one PSO, rather than all, as these additional ones may not end up being used, so this seemed the best bang-for-buck for reducing runtime stuttering, while keeping material compilation fast.

It's likely possible to be able to determine which PSOs will and wont be useful up front, and perhaps this code can evolve over time using contextual data from the materials to determine how effective cache warming will be. So perhaps based on certain flags, this step can be skipped. It still appears to be a net benefit for a good portion of materials however.

The match rate of PSOs from the default material's to the regular materials is not 100%, however, from outputting the PSO descriptor hashes, there were matches with the corresponding default material more often than there were not. Though yes, when resource inputs change their structure (e.g. differing geometry layout), these matches do not always line up. But as you mention, for these cases, the PSO descriptor only requires vertex input data structure and fragment output format, along with a few other state properties such as blending, colour channel masking etc; however these usually appear to be consistent for a given material type in EEVEE, as those are generally dependent on the type of pass, more so than the material being rendered(?). For Metal specifically, most state is dynamic, e.g. resource bindings, viewport/scissor, depth-stencil etc; so these parameters do not need to be part of the PSO descriptors. I limited the cache warming for EEVEE materials from default materials to only one PSO, rather than all, as these additional ones may not end up being used, so this seemed the best bang-for-buck for reducing runtime stuttering, while keeping material compilation fast. It's likely possible to be able to determine which PSOs will and wont be useful up front, and perhaps this code can evolve over time using contextual data from the materials to determine how effective cache warming will be. So perhaps based on certain flags, this step can be skipped. It still appears to be a net benefit for a good portion of materials however.

Clément Foucault commented

2023-02-14 12:39:33 +01:00

Thanks, I understand better now. But then the documentation about this is rather lacking. It is not stated that the parent material is just a template and that the actual final shader can actually differ from it (in terms of interface / pipeline state).

Michael Parkin-White commented

2023-02-14 12:41:48 +01:00

First-time contributor

Will improve the documentation.
Where would be the most appropriate location for this? In the header, or alongside the usage here?

Will improve the documentation. Where would be the most appropriate location for this? In the header, or alongside the usage here?

Clément Foucault commented

2023-02-14 12:49:34 +01:00

In the header.

source/blender/gpu/GPU_material.h

						
				@ -275,2 +286,4 @@

				bool GPU_material_optimization_ready(GPUMaterial *mat);

				/**

				 * Store reference to default material for async PSO cache warming.

Clément Foucault commented

2023-02-13 11:13:20 +01:00

Maybe note what is the expected status of both parameters. I guess material can be in any state, whereas default_material should be compiled?

Maybe note what is the expected status of both parameters. I guess `material` can be in any state, whereas `default_material` should be compiled?

Michael Parkin-White commented

2023-02-14 12:11:24 +01:00

First-time contributor

Yep, can add this clarification.
The function wont fail if this is not set, async warming will just be skipped, but based on the control flow, there should never be a situation where this is not true.

Yep, can add this clarification. The function wont fail if this is not set, async warming will just be skipped, but based on the control flow, there should never be a situation where this is not true.

fclem marked this conversation as resolved

source/blender/gpu/GPU_shader.h

						
				@ -109,0 +109,4 @@

				/* Shader cache warming. Cache can be warmed using PSO descriptors

				 * from a specified parent shader. */

				void GPU_shader_set_parent(GPUShader *shader, GPUShader *parent);

				void GPU_shader_warm_cache(GPUShader *shader, int limit);

Clément Foucault commented

2023-02-13 11:14:13 +01:00

Document this function.

fclem marked this conversation as resolved

Clément Foucault commented

2023-02-13 22:02:33 +01:00

As a side note, I think the patch is in mergeable state, that's why I approved it. But would like to see the small fixes be done first.

Michael Parkin-White commented

2023-02-14 12:11:58 +01:00

First-time contributor

Will address all feedback and resubmit, thanks!

Jason Fielder added 1 commit 2023-02-14 18:42:40 +01:00

Eevee: GPU Material node graph optimization.

buildbot/vexp-code-patch-coordinator Build done.

Details

b14c5cda89

Certain material node graphs can be very expensive to run. This feature aims to produce secondary GPUPass shaders within a GPUMaterial which provide optimal runtime performance. Such optimizations include baking constant data into the shader source directly, allowing the compiler to propogate constants and perform aggressive optimization upfront.

As optimizations can result in reduction of shader editor and animation interactivity, optimized pass generation and compilation is deferred until all outstanding compilations have completed. Optimization is also delayed util a material has remained unmodified for a set period of time, to reduce excessive compilation. The original variant of the material shader is kept to maintain interactivity.

Also adding a new concept to gpu::Shader allowing assignment of a parent shader from which a shader can pull PSO descriptors and any required metadata for asynchronous shader cache warming. This enables fully asynchronous shader optimization, without runtime hitching, while also reducing runtime hitching for standard materials, by using PSO descriptors from default materials, ahead of rendering.

Further shader graph optimizations are likely also possible with this architecture. Certain scenes, such as Wanderer benefit significantly. Viewport performance for this scene is 2-3x faster on Apple-silicon based GPUs.

PR Feedback Addressed.

Authored by Apple: Michael Parkin-White

Related to #96261

Clément Foucault approved these changes 2023-02-14 19:29:08 +01:00

Clément Foucault commented

2023-02-14 19:31:41 +01:00

@blender-bot build

Clément Foucault added 1 commit 2023-02-14 19:32:15 +01:00

Merge branch 'main' into NodeGraphOptimization_v3 ef102d1d64

Clément Foucault added 1 commit 2023-02-14 19:58:15 +01:00

Merge branch 'main' into NodeGraphOptimization_v3 3d27f252c6

Clément Foucault merged commit 7b9d1cb51f into main

2023-02-14 21:51:14 +01:00

Clément Foucault referenced this issue from a commit

2023-02-14 21:51:15 +01:00

Eevee: GPU Material node graph optimization.

Martijn Versteegh commented

2023-02-15 02:36:43 +01:00

Since this is committed the assert

BLI_assert(material != default_material);

in the function 755│ void GPU_material_set_default(GPUMaterial *material, GPUMaterial *default_material) (gpu_material.c ) fails for me.

Since this is committed the assert BLI_assert(material != default_material); in the function `755│ void GPU_material_set_default(GPUMaterial *material, GPUMaterial *default_material)` (gpu_material.c ) fails for me.

Jeroen Bakker added this to the 3.5 milestone 2023-02-15 08:23:51 +01:00

Michael Parkin-White commented

2023-02-15 10:46:38 +01:00

First-time contributor

Since this is committed the assert

BLI_assert(material != default_material);

in the function 755│ void GPU_material_set_default(GPUMaterial *material, GPUMaterial *default_material) (gpu_material.c ) fails for me.

Will see if I can repro and submit a fix. Thanks for raising.

> Since this is committed the assert > > BLI_assert(material != default_material); > > in the function `755│ void GPU_material_set_default(GPUMaterial *material, GPUMaterial *default_material)` (gpu_material.c ) fails for me. > > Will see if I can repro and submit a fix. Thanks for raising.

Clément Foucault commented

2023-02-15 11:09:57 +01:00

Here is a test file for the assert.

test.blend

879 KiB

Martijn Versteegh commented

2023-02-17 19:23:49 +01:00

This still bugs quite hard for me. I don't even need a test file.

Just opening the default scene and switching to the shader workspace in a debug build will quit with:

BLI_assert failed: source/blender/gpu/intern/gpu_shader.cc:510, GPU_shader_set_parent(), at 'shader != parent'

This still bugs quite hard for me. I don't even need a test file. Just opening the default scene and switching to the shader workspace in a debug build will quit with: ``` BLI_assert failed: source/blender/gpu/intern/gpu_shader.cc:510, GPU_shader_set_parent(), at 'shader != parent' ```

Martijn Versteegh referenced this pull request

2023-02-18 13:19:53 +01:00

Assert fails when switching to the shading tab #104918

Sign in to join this conversation.

No reviewers