EEVEE: Improved deferred lighting efficiency #129268
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#129268
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Motivations
The current deffered light evaluation implementation is done in a single shader that evaluates all closures for all lights affecting a pixel.
The shadowing is evaluated inline inside the light loop and increase the complexity of the shader drastically (since we now use shadow map ray-tracing).
This makes the shader super heavy (+6000 ALU, bad for occupancy) and very slow to compile (+2 sec each) which slows down startup time.
The entry point of the shader can be found in
eevee_deferred_light_frag.glsl
.This task is about how to improve efficiency of this shader and reduce its compile time.
Current state
Here is the pseudo code of the current shader:
Observations
An initial investigation showed that removing the shadow eval, bypassing the acceleration structure, only loading one closure, and removing the Rectangular light reduce the ALU count to around ~1000 and improved compilation time by an order of magnitude (need to check exact timings). (FPS perf?)
Moving lights data to UBOs might be beneficial too in terms of memory access speed, especially on low end hardware.
Proposal
Here is the proposed split evaluation approach:
Splitting closure evaluation seems the easiest, but relies on split shadow to avoid recomputing shadowing per closure.
Splitting shadow computation is quite involved and needs some design as it has many requirements.
Splitting Rectangular lights evaluation seems quite easy but needs adjustment in optimization structure.
Open Problems
How to fallback when we overflow the optimized pipeline with too many lights?
Dependencies
One note about the 32 lights limit.
If instead of having one global index for each shadow we do something like:
Then we wouldn't be limited to a fixed number of lights in the scene, but to a fixed number of lights overlapping in any single pixel, which sounds like a much more reasonable limitation.
The goal is to reduce complexity of individual shaders to a maximum. If all shadows are evaluated in the same pixel invocation, then we don't gain much compared to the current situation. Also you get very bad data coherence if the set of light differ per pixels.
Also, even with this method we need to have a fallback, for more than 32 lights per pixel. 32 is not acceptable in production environment. Leaving either flickering (shadow? light?) when the limit is exceeded would be a major let down compared to now.
So we need a solution that can be dispatched N times or doesn't have the limitation (stochastic light?).
Regardless of the method used to render the shadows, I don't think we have to be limited to any fixed number.
At 1 bit per shadow we could store a lot more than 32 without too much memory consumption/bandwidth.
We could use a screen space virtual texture (or store the tiles per light cluster), with CPU readback to allocate/deallocate tiles as needed.
I find preferable a single method that can be scaled over a split between fast/fallback paths.
About evaluating different light types on separate shaders, we could store the
shadow_index
inside an image that can be read by the next shader invocation. Although I'm not sure if there's a way to synchronize that without killing performance.Another alternative could be to use the light index inside the pixel's light cluster. But the less fine grained the shadow selection is, the larger the memory cost of storing the masks.
I would also prefer that. I just listed it here as it is a viable option.
I am very worried about the memory consumption that EEVEE current has. I would strive to avoid more per pixel storage as this scales very badly. However I do acknowledge that this is an option.
This is kind of similar with what I mentioned here:
Needs to be known on CPU for either acceleration structure appropriate allocation and/or consecutive dispatches. Empty dispatch have some costs.
But I believe virtual texture approach would not need a different dispatch. However indirection also has huge impact on performance (see Virtual Shadow Maps). Need to profile.