EEVEE-Next: Reduce longer compilation time #120100
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#120100
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
It seems that nvidia drivers have a harder time with our new shaders for some reason (~3x from what I read).
First cold start of EEVEE-Next also takes quite longer than I would like to (a few seconds without any feedback), we could display a black frame with a waiting message instead, but ideally it should be less than 3 seconds.
So I think we have no choice but to optimize it at least a little for the first release.
The first ideas that comes to mind is to look at which part of the code is causing the most slowdown on the affected drivers and fix it.
My guess is that it is likely to be caused by aggressive loop unrolling like it was on Metal + M1 before the recent fix. However, it seems that working around it is quite tricky as there is no clear preprocessor directives for loop unrolling in GLSL and the extension that adds it looks quite unsupported.
The other approach would just be to reduce code size as much as possible. We could try to preprocess the GLSL string using our own obfuscator at compile time, but that looks unrealistic and the benefits are not quite clear.
Instead we should leverage SpirV. This can give several solutions:
ARB_gl_spirv
extension. This avoid the driver to do the parsing and most of the conversion.shaderc
) then convert back to GLSL (spirvcross
) to feed to the driver without deadcode, comments, and already optimized GLSL. This can work on older implementation that do not support theARB_gl_spirv
extension.shaderc
(and optionallyspirvcross
) to compile the shader, then it becomes easier to precompile the shaders in many threads without needing GL contexts.The GLSL interface might need a bit of tweaking to be able to be injected to
shaderc
but I am convinced this is worth the cost.Note that all of these options should be profiled beforehand on a set of typical EEVEE-Next shaders to check what is the best way forward.
Note that this task is not proposing to ship precompiled SpirV shader sources.
The Vulkan backend would give us all these, but the timeline for it to become default is not aligning with the initial release nor the second release of EEVEE-Next.
Multithreaded compilation in OpenGL
We have to change our compilation model to accommodate for that.
The goal is to use the parallel shader compile extension. This doesn't need a different context for it to work. But we need to rework the interface with the GPU module for that to work.
https://forums.developer.nvidia.com/t/bugs-with-gl-arb-parallel-shader-compile/43715/8
https://www.reddit.com/r/opengl/comments/121j3q1/seeking_clarifications_on_multithreaded_shader/
I'm leaving here my
GPU_shader_create_from_info
times for reference:It looks like
light_eval
might be the worst offender.After some googling, I've found some Nvidia directives that help quite a bit:
I haven't found any documentation related to these, though.
And I haven't checked runtime performance either.
Regardless, I agree that using SPIRV may be the best option moving forward.
@pragma37 This is only for cold startup of the engine. Even if this is good to optimize (and is neeeded) the main friction point is the cost of material compilation.
Add that to you're note !
Even with rtx 4070 8GO i have the same issue 🤷♂️🤷♂️ He took 4 - 8 seconds to run the first time ! Less for second time but still long waiting for showing something on the screen
Doesn't seem to be the case for all materials:
https://devtalk.blender.org/t/blender-4-2-eevee-next-feedback/31813/416
My guess is that purely deferred materials are probably ok, but forward and ShaderToRGB materials must be getting hit by the `light_eval' overhead as well.
I'm wondering if parts of that could be done without going to SPIR-V and friends. My impression is (that I have not validated/checked myself though), is that even in ye olde OpenGL it is possible to do "multi-threaded" shader creation/compilation, without resorting to multiple OpenGL contexts. Just the pattern of function calls has to be something along the lines of:
instead of the current code flow, which is:
The current way of doing things (which is "for each shader: fully compile said shader") does not allow multi-threading even for other APIs (like Metal or Vulkan) that could do it. So maybe something like "create many shaders" function would need to get added to the GPU backend, and the backend could decide how to best deal with it.
I think it's worth giving a try to the @aras_p suggestion, especially since we will have to do something along those lines regardless of the compilation method.
That said, that's not going to solve the issue with single shaders taking seconds to compile, which is pretty bad for the material editing UX.
For Nvidia, we could use the
GPU_material_optimization
system for disabling loop unrolling and inlining in the first compilation, but I'm not sure about the other GPU vendors (or if there's a problem with those in the first place).I've just checked, and it looks like compile times took a big hit after #119713:
Whith loop unrolling and inlining disabled the difference is not that bad, though:
That compile times regression has been fixed by
2d3368f5bf
.There's been a pretty bad regression in compile times recently:
We may need to start testing compile times regularly, or test automatically if possible.
We can add
--debug-gpu-compile-shader
as a benchmark task so it will be tracked.I’ve been taking a look at ways to improve shader compilation times.
As a recap, there are 3 main issues on the user side (mainly on Nvidia):
I don't think there's silver bullet to fix all these at once, and we may require multiple strategies with different levels of complexity:
Find the cause of the compile times slow-downs
AFAIK, there's no tooling for figuring out where the compiler is taking more time. We can infer by checking differences between shaders and sometimes we can find regressions, but fully fixing compile times this way would be extremely time consuming, and maybe not even fully possible.
The currently known main cause of compile times slow-downs is light and shadow evaluation:
eevee_deferred_light_frag.glsl
.(This only affects the internal static shader).
Test results done after the last detected regression (#120329) (Cumulative):
2-step compilation
As mentioned before, disabling loop unrolling can heavily improve compile times, but it can also heavily degrade render performance.
A possible solution could be to use the Material optimization system to compile materials without loop unrolling first and deferring the optimized (unrolled) version compilation.
The main issue with this approach is that even shader compilation on a separate OpenGL context freezes the Blender UI, so it makes things even worse in practice.
SPIRV
AFAIK, SPIRV compilation would be the equivalent to
glCompileShader
, while the main bottleneck comes fromglLinkProgram
, so optimizing shader compilation instead of program linking could yield minor improvements at best.It's also worth noting that regular GLSL and SPIRV GLSL aren't fully compatible, so getting this to work even if just for testing is far from trivial.
Multithreaded Compilation
I've made a branch with
GL_ARB/KHR_parallel_shader_compile
support (#121093), and while compile times become around twice as fast, I wouldn't consider this good enough to mark the problem as solved.It also has the problem of making the viewport even more unresponsive during material compilation.
I've made a standalone test (https://projects.blender.org/pragma37/test-parallel-shader-compilation) to verify I wasn't doing anything wrong on the Blender side and the performance difference was mostly on par.
I've managed to make it faster (from 2x up to 4x) after some tweaks, but I wasn't able to get the same results on Blender.
So it may be possible that the Blender implementation could be improved, but it may also be just a case of the standalone app being a much simpler context that the driver can manage more easily.
In any case, even a 4x improvement on a 24 threads CPU is far from optimal.
So far I got the best results (10x) by spawning a new process for each program compilation (the
spawner.py
script in the repo).This actually manages to put my CPU at 100% usage, and it would also have the advantage of not blocking the Blender UI.
However, this would require:
Or maybe just use subprocesses and pipes?
While I think this might be the best way to go, it opens several cans of worms and it seems too risky to implement at the end of BCON1.
Your conclusion for SPIR-V is in line with our previous conclusion. So yeah.
About process spanning. We used to do that for cycles OpenCL in the past. far from ideal as it is also bound to the amount of RAM the user has and error logging is tricky very error prone.
Just to mention: a thing I am considering for Vulkan is to add support
VK_EXT_graphics_pipeline_library
This allows smaller shaders (vertex input, vertex, fragment, attachment out) and pick and chose when creating a pipeline.I agree that relying on subprocesses is far from ideal, but I don't see any other alternative for significantly improving shader compile times.
I've left this out of my previous report since I was getting inconsistent readings, but today I've measured again and I'm getting pretty consistent results.
Skip unnecessary material passes
EEVEE-Next requires more pass types per material, this results in compiling up to 3x more shaders than EEVEE Legacy.
However, many of these shaders are functionally equivalent to the default one (when vertex displacement and transparency is disabled), so we can detect these cases and skip the compilation. (#121137)
With the new parallel and non-blocking compilation (optional) I consider this issue solved.