GPU: API Redesign (high level) #120174
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#120174
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
GPU API Redesign
Management info (TLDR)
A feedback I received from the Vulkan Render graph design, my own analysis and external expertise is that in a future we require a render graph on GPU level which replaces/integrates parts of the draw module. At this time it is unclear how that API will look like, but on high level the requirements of such API can be described.
This design task will elaborate on how we got to these requirements and how these requirements could potential influence the GPU and its users in its whole.
The outcome of this task is to continue with a vulkan specific render graph, but take into account that in 1 year this could lead to a GPU API. This new API will exist besides the other API (GPUBatch, GPUImmediate) due to large impact to fully replace them. In a realistic situation the draw manager and Python GPU module would be fully work on the GPU rendergraph API. Prototyping is needed to design the detailed API.
References
Current state
Currently we have several APIs related to GPU drawing.
GPUImmediate
GPUImmediate
is a compatibility layer to use OpenGL pre-core programming model. It was introduced during Blender 2.8 when we switched to the OpenGL core profile.The API should be deprecated and it usages should be replaced by GPUBatch. However due to the understandability of this API to non GPU developers and not promoting that this API is deprecated it is actually the most commonly used API concerning UI/Editor drawing.
Draw backs of this API is that geometry/data aren't kept on the GPU and data needs to be resent each time.
GPUBatch
GPUBatch
uses a shader based approach. Where geometry batches are created and uploaded and when the geometry doesn't change it can be reused by other shaders or for the next frame.Due to the ability to prepare geometry batches it is faster then GPUImmediate mode.
DrawManager
DrawManager
is an API on top ofGPUBatch
that adds a programming model for high performance rendering. It is typically used to draw the 3D viewport where performance matters.The API uses some best practices on how to order draw commands to reduce context switches.
Limitations
Current APIs have a disconnect with how modern GL APIs are structured that leads to non optimal performance and complex translation code in our backends.
Pipelines
Modern GL-APIs (Vulkan/Metal) provides accessibility to pipelines. A pipeline is a configuration of the GPU when performing drawing or compute commands.
Most change to the pipeline configuration can trigger a recompilation of the pipeline, including a recompilation of the shader. This can already happen by changing the blend mode using
GPU_blend
of using geometry with just a different layout.Texture layouts
Pixels of textures that are used in a pipeline must be in a specific layout. The layout depends on how the texture is used inside the pipeline.
There are different layouts for
TRANSFER_READ/WRITE
SHADER_READ/WRITE
ATTACHMENT
PRESENT
When transferring a texture to a new layout, the previous layout needs to be provided as well. Changing a layout can be done by providing so called pipeline barriers. A pipeline can alter the layout of a whole texture, but also a single layer or LOD level. In this case a texture can have a mixed layout.
More information about pipeline barriers will be provided in the section about resource versions.
Command reordering
Commands that are send to the GPU stack can be executed in a different order on the GPU. This is done to reduce pipeline recompilation and resource layout transitions.
Depending on the backend the responsibility can be somewhere different. In OpenGL this is a driver responsibility and the application isn't aware and cannot influence the reordering directly. In Metal this is also a driver responsibility, but the application can provide hints to influence the reordering. In Vulkan this is the sole responsibility of the Application.
Resource versions
With the command reordering in mind it is important to track versions of resources. You don't want to reorder the commands in a way that it uses a different content of the resource.
Every time a pipeline (or CPU code) alters a resource a new version will be tracked. It is the same resource, only the commands before the change are scoped and must not use the content of new resource.
A pipeline barrier can be added before the commands to guard a resource between read and write actions. It also is used to transform the layout of a texture. These pipeline barriers also need to know how resources are going to be used until the next pipeline will be added so it only locks the GPU when it is needed.
Backend implementation
Currently our APIs are limited to a single batch and logic is required to fulfill the requirements of modern GL-APIs.
Before sending the commands to the GL-API the actual commands are recorded in an intermediate buffer. When the intermediate buffer is send to the GPU (via a flush/finish, or other event) the intermediate buffer is analyzed to reorder commands and to generate the correct pipeline barriers.
It could be that the reordering and pipeline barriers will be the same when looking over frames, but due to the API granularity level the GPU backend doesn't know what it is actually drawing/computing.
Other software
How do game engines and other GPU frameworks solve this?
WebGPU/wgpu
WebGPU is a standard to provide low level access to GPU devices on the Web. wgpu is an widely used implementation of this standard. The API is designed in such way that the developer can create a flow between pipelines and point out how resources are used between them. These pipeline flows are called RenderPipelines and ComputePipelines. I used the term pipeline flow as to not confuse vulkan and metal developers with with graphics pipeline and compute pipeline.
The implementation can extract and cache pipeline barriers with each pipeline flow. The next time a pipeline flow is used the resource handles are updated and the already extracted commands are submitted to the GL-API
Godot
Godot has a similar implementation as we do. They have their own GPU API which is also accessible to game developers. The API is shader based. After reaching out to them about what they think would be their target API they responded that they are also inspired by WebGPU. If that WebGPU was defined before they developed their API they might have used it.
AAA game engines.
Since 2017 there is a lot of presentations done at GDC and other conferences about optimization. Most APIs I have seen are based on a similar approach as WebGPU, but with resource tracking across pipeline flows.
This approach has multiple names, but more often it is called a render graph. The render graph contains nodes. Each node has a list of relations with its resources. Depending on the implementation a single node can contain a single compute pipeline or a flow of compute pipelines. Similar to render nodes (graphics).
Some framework (nicebyte) try to do something smart so they don't need to add a render graph API. They use something similar as vulkan synchronization validation layer does. However there are some limitations to when this can be used. These limitations include multi threaded drawing.
Some differences between Blender and these framework is how drawing and threading is organized. Games typically use one main drawing thread and can have a small number of helper threads. The helper threads are often used for data transfers and compute passes to update textures or the scene (physics). Blender however can have multiple drawing threads for example when performing background rendering, baking or GPU compositing. Each thread has its own context, but eventually submit to the same GPU queue.
References:
Target API
Our target should be to move the render graph (currently part of the draw manager) as its only API. Usages of other APIs should be migrated to the render graph API.
This API should give the API-user more clarity of what is actually needed. The GPU Backend also gets more context of what the API-user is doing and make better decisions. Also being able to cache decisions for the next time to reduce CPU cycles.
Although the details of the API isn't clear, it is clear that there are several stages when using the API. All code examples here don't represent the final API and should only be read as guide. For now I kept as close to the current Draw manager API.
All details are open for discussion.
Defining a render graph node.
Defines a template for a render graph node. A render graph nodes can have multiple passes and multiple draw commands per pass.
Syncing a render graph node.
When syncing the resources the render_node_info_ can be used to initialize an instance where the render_node_ and resources are linked.
When first used the
render_node_info_
can be analyzed by the GPU backend creating a list of commands that are needed to send to the GPU. These commands wouldn't contain any references to the actual resources. Names or ids could be used inside the list of commands.After the
render_node_
is initialized it contains a preparedrender_node_info_
. Resources can be added; the added resources will be stored beside therender_node_info_
. The resources andrender_node_info_
will be merged later on in the drawing process.Submitting a render graph node
When it is decided to draw a node the node is sent to the GPU backend.
The GPU backend adds the node to the render graph of the current context.
No GPU commands are sent during this phase hence the name
add_node
.Context render graph submission.
Eventually the GPU Backend will send the commands to the GPU. Just before this happens the nodes are reordered to reduce pipeline recompilations.
After the order is known the resources can be merged with the commands and the pipeline barriers (already part of the command list) can be updated to use the actual state of the resource.
As the submission happens later in the process we have more information how a specific resource version is used and that can lead to better pipeline barriers.
The commands can be recorded into a GL-API specific command buffer and submitted to the device queue.
Project phasing
How do we get from the current state to the target state?
Step 1: Vulkan application responsibility
OpenGL and Metal both have a full render graph or part of it as driver level responsibility. For Vulkan the application is fully responsible to provide the correct calls.
Our Vulkan backend doesn't have a render graph, and lacks performance and correct resource synchronization. Beginning 2024 research and experiments were performed how to solve this. Due to our threading model a low level render graph would be a good solution.
This render graph would not be a replacement for the draw manager but would address be able to translate GPUBatch and GPUImmediate mode APIs to a render graph to create the correct list of commands to send to the GPU.
Prototype is available in #118963
In stead of having a ComputeNode and a GraphicsNode it also contains many nodes that are specific to the GPUBatch/GPUImmediate APIs. Implementation and design details are inside the mentioned tasks.
The reason to prioritize the Vulkan specific render graph before the GPU render graph:
Step 2: GPU RenderGraph API
After phasing out OpenGL and based on the VKRenderGraph we can design a GPURenderGraph. Phasing out OpenGL is not a hard requirement, but would reduce the amount of work.
Using test cases we can validate correct working of the Vulkan and Metal render graph implementation.
Step 3a: API migration
Step 3 is a migration process where the usage of the GPUBatch and GPUImmediate mode APIs are migrated to the GPURenderGraph process. The order of this process can be discussed and depends on the needs the moment we start the migration.
The idea is to keep all APIs working until the whole code base is migrated to the render graph approach.
The order described here is to first migrate GPUBatch calls inside the editors as these are not that many as GPUImmediate or as complex as DrawManager. After getting some experience we can plan the other migrations better.
Step 3a: API migration
Migrate
GPUBatch
in editor code withGPURenderGraph
.Step 3b: API migration
Migrate
DrawManager
withGPURenderGraph
. Most likely here are the most benefits as draw manager also stores a list of commands. This would then be joined with the render graph and the GPU backend doesn't need to reverse engineer all the information it needs.Step 3c: API migration
Python API migration. Currently Python API design has issues as it is extracted from the internal API. Users are requesting for access to features that doesn't fit well. So there is value in implementing this.
Step 3d: API migration
I doubt the benefit vs development effort for this phase is ok. There is a lot of code that needs to be refactored. Users will eventually get a more fluent UI and developers will need to maintain a smaller code base.
Risks
GPU: API Redesignto GPU: API Redesign (high level)Possibly naive comment from the outside (I might be talking complete nonsense, in which case just tell me to shut up :)):
As in, "render graphs" are not meant for improving CPU performance, resource tracking, etc. etc. Their primary reason for existence (and all the "API user" complexity they bring!) is to have some system for figuring out, which parts of the frame can reuse the same memory that would be used by another part of the frame.
I'm not sure how much (or at all?) is that relevant for Blender's use case.
My impression would be along the lines of:
Your insights are always helpful. We most of the time can only fall back to paper and presentations to get these insights. When talking to game engine developers some months ago their reasons (for a vulkan PoV) was synchronization.
In case for blender we can have multiple CPU threads that uses the same resources. These resources are device specific and shared. (for this the render graph isn't needed as the resources needs to be guarded by a lock.
When transitions happen you need to keep track of where the resource was used and how it will be used in the (near future). Here it becomes a bit trickier. Versioning is used to generate 'optimal' barriers. And validate that we didn't make a mistake.
Reordering is mostly needed where the Blender API isn't sufficient. Reduce pipeline switches when data transfer/compute commands are done during drawing, improve clear operations on renderpass binding.
So I generally agree that a render graph as implemented by game engines isn't needed. I do believe that having a graph to track resources would lead to generating better barriers. The 'nodes' itself can still be evaluated by a back-to-front iteration to populate destination usages and a front to back iteration to populate source iterations. In the future we are planning to track resources usage per pipeline stage and reduce GPU resets when reading back buffers to the CPU. So yeah, we call it a render graph, but perhaps the implementation is just a list.
The complexity of the render graph is far less than the render graph you're mentioning. Nodes as evaluated in sequence. Selection and barrier extraction is done using the info in the graph. Draw manager already does most of the ordering, making sure that the draw manager API fits better on the GPU backend will reduce CPU cycles, which is the main benefit.
Yeah I think I probably misunderstood most of this since "render graph" term is most commonly used to describe "a system that would allow me to save several hundred MB of video memory in a complex frame pipeline".
It is very likely that the "common wisdom" used in game engines does not apply (or applies very little) to Blender's use case. For example, most/all of games do not have the setup where several "windows" can be rendered from different threads, all trying to access the GPU. If a game engine would be doing multi-threaded draw call submission (not many do! especially now that many engines are moving towards a GPU driven rendering pipeline, the CPU is not doing much work anymore, so there's little need for multi-threaded draw submission complexity).
Anyway, the threaded draw call submission in game engines (again, if they bother doing it at all) from what I've seen is much simpler than what you allude that Blender would need. So likely "some sort of other way" of achieving that within Blender would be needed. Maybe long high level locking is indeed the only sensible approach, who knows.