Cycles split kernel optimizations #82583

Closed
opened 2020-11-10 14:45:56 +01:00 by Brecht Van Lommel · 7 comments

This task intends to gather ideas to optimize the Cycles split kernel.

There are a few major things to work on:

  • Reorganize split kernels so that shader evaluation and ray-tracing are not duplicated in multiple kernels, or not duplicated as many times. This could reduce kernel compile times, register pressure, and improve coherence. Alternatively, shader evaluation for background, shadows, volumes, could be specialized so that it does not include nodes not used in any such shader (for example volumes don't need BSDFs).
  • This would likely involve some rethinking of how we handle transparent shadows, and maybe light and background shaders.
  • Replace usage of queues and atomics for scheduling work, and replace with sorting between split kernels as done by some other renderers.
  • Reduce the size of the state that needs to remain in memory between split kernels. The number of rays that can be active is limited by this. It can result in low occupancy or not enough memory leaft for scene data.
    • Render passes could be written directly to memory to make PathRadiance much smaller (#72293)
    • ShaderData: there are ways to reduce the size of shader globals. Computing some members on demand rather than storing them, compression (lossless or lossy), simpler differentials.
    • It may be possible to structure the kernels so that only one closure needs to be stored at a time or for a shorter time, however this may come with some noise trade-offs.
  • If shader evaluation is isolated to one or fewer kernels, ray sorting by material ID can improve coherence. Similar sorting may help other split kernels.

I would consider removing branched path tracing for GPU rendering entirely (#52725), since this is not particularly suitable for GPUs and complicates the code. It would be easier to refactor without this.

This task intends to gather ideas to optimize the Cycles split kernel. There are a few major things to work on: * Reorganize split kernels so that shader evaluation and ray-tracing are not duplicated in multiple kernels, or not duplicated as many times. This could reduce kernel compile times, register pressure, and improve coherence. Alternatively, shader evaluation for background, shadows, volumes, could be specialized so that it does not include nodes not used in any such shader (for example volumes don't need BSDFs). * This would likely involve some rethinking of how we handle transparent shadows, and maybe light and background shaders. * Replace usage of queues and atomics for scheduling work, and replace with sorting between split kernels as done by some other renderers. * Reduce the size of the state that needs to remain in memory between split kernels. The number of rays that can be active is limited by this. It can result in low occupancy or not enough memory leaft for scene data. * Render passes could be written directly to memory to make `PathRadiance` much smaller (#72293) * `ShaderData`: there are ways to reduce the size of shader globals. Computing some members on demand rather than storing them, compression (lossless or lossy), simpler differentials. * It may be possible to structure the kernels so that only one closure needs to be stored at a time or for a shorter time, however this may come with some noise trade-offs. * If shader evaluation is isolated to one or fewer kernels, ray sorting by material ID can improve coherence. Similar sorting may help other split kernels. I would consider removing branched path tracing for GPU rendering entirely (#52725), since this is not particularly suitable for GPUs and complicates the code. It would be easier to refactor without this.
Author
Owner

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'
Author
Owner

Added subscribers: @brecht, @BrianSavery

Added subscribers: @brecht, @BrianSavery
Member

Added subscriber: @Jeroen-Bakker

Added subscriber: @Jeroen-Bakker
Member

OpenCL 2.0 introduced Pipes; A mean to communicate between simultaneous running kernels.
The benefit is that a pipe is a fixed size in global memory and is more likely to be cached by HW caches. It could also be used fine tune performance by changing the size of the pipes.

Would be an idea to research this as a alternative to queueing and sorting.

Shader globals could also be spliced into multiple smaller variants one optimized for intersection, other one for shading, other one for integration etc.

OpenCL 2.0 introduced Pipes; A mean to communicate between simultaneous running kernels. The benefit is that a pipe is a fixed size in global memory and is more likely to be cached by HW caches. It could also be used fine tune performance by changing the size of the pipes. Would be an idea to research this as a alternative to queueing and sorting. Shader globals could also be spliced into multiple smaller variants one optimized for intersection, other one for shading, other one for integration etc.
Member

Added subscriber: @Alaska

Added subscriber: @Alaska
Author
Owner

Changed status from 'Confirmed' to: 'Archived'

Changed status from 'Confirmed' to: 'Archived'
Author
Owner

Task superseded by #87836 (Cycles: GPU Performance).

Task superseded by #87836 (Cycles: GPU Performance).
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#82583
No description provided.