WIP: Rewrite Graph Editor drawing code #118662

Draft
Christoph Lendenfeld wants to merge 18 commits from ChrisLend/blender:ge_gpu_batch into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.

Not meant for merging. This is an exploration of rewriting the GE drawing code to make it scale better.

Issue

The issue I am trying to solve is the massive slowdown of the Graph Editor when displaying large amounts of keys and handles.
Profiling showed this is mainly a CPU bottleneck when sifting through all FCurves and adding those to the IMM buffer.
image

Solution of this PR

Run the processes that prepare data for the GPU in parallel, then upload later.

The root function is graph_draw_curves.
Within that I run a threading::parallel_for over each of the FCurves to generate the data needed to feed the GPU.
After all that data is passed to draw_fcurve_render_data which handles the actual drawing.

Questions

Currently the code in the PR copies data twice, once in the build, then later into the GPU vertex array.
Is there a way to remove the second copy? The data is already exactly in the format that the GPU wants, so I thought we could just give it a pointer and upload directly?

Performance

TODO

Known Issues

  • drawing order isn't as before. Selected verts are not guaranteed to be on top
  • graph_draw_driver_debug is never called
  • curves only draw on the second refresh. no idea why
  • sampled curves will not work
  • curves with modifiers will not work
  • handles points are drawn in the same shape as keys
  • unselected curves are drawn too thick

Test file

https://drive.google.com/file/d/11wQhuYmhCtQgjT4uJWORTwKlWit1irp2/view?usp=sharing
Imported and extended from: https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture/the-motionbuilder-friendly-bvh-conversion-release-of-cmus-motion-capture-database

Not meant for merging. This is an exploration of rewriting the GE drawing code to make it scale better. ## Issue The issue I am trying to solve is the massive slowdown of the Graph Editor when displaying large amounts of keys and handles. Profiling showed this is mainly a CPU bottleneck when sifting through all `FCurves` and adding those to the IMM buffer. ![image](/attachments/2e8a5ecd-dce7-4c15-be05-f7820f9d4525) ## Solution of this PR Run the processes that prepare data for the GPU in parallel, then upload later. The root function is `graph_draw_curves`. Within that I run a `threading::parallel_for` over each of the `FCurves` to generate the data needed to feed the GPU. After all that data is passed to `draw_fcurve_render_data` which handles the actual drawing. ## Questions Currently the code in the PR copies data twice, once in the build, then later into the GPU vertex array. Is there a way to remove the second copy? The data is already exactly in the format that the GPU wants, so I thought we could just give it a pointer and upload directly? ## Performance TODO ## Known Issues * drawing order isn't as before. Selected verts are not guaranteed to be on top * `graph_draw_driver_debug` is never called * curves only draw on the second refresh. no idea why * sampled curves will not work * curves with modifiers will not work * handles points are drawn in the same shape as keys * unselected curves are drawn too thick ## Test file https://drive.google.com/file/d/11wQhuYmhCtQgjT4uJWORTwKlWit1irp2/view?usp=sharing Imported and extended from: https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture/the-motionbuilder-friendly-bvh-conversion-release-of-cmus-motion-capture-database
Christoph Lendenfeld added the
Module
Animation & Rigging
label 2024-02-23 14:21:00 +01:00
Christoph Lendenfeld added 3 commits 2024-02-23 14:21:05 +01:00
Hans Goudey reviewed 2024-02-23 15:15:17 +01:00
Hans Goudey left a comment
Member

Just reading this out of curiosity and left some comments. Sorry if some of them are too preemptive!

I wonder if skipping the temporary arrays and uploading the data directly to the vertex buffer would be faster?

Just reading this out of curiosity and left some comments. Sorry if some of them are too preemptive! I wonder if skipping the temporary arrays and uploading the data directly to the vertex buffer would be faster?
@ -1479,0 +1516,4 @@
ColorTheme4b color_sel = get_fcurve_vertex_color(fcu, true);
ColorTheme4b color_desel = get_fcurve_vertex_color(fcu, false);
const blender::int2 bounding_indices = get_bounding_bezt_indices(fcu, range[0], range[1]);
Member

Might be nicer if this returned an IndexRange, which could then be iterated like for (const int i : bounding_indices)

Might be nicer if this returned an `IndexRange`, which could then be iterated like `for (const int i : bounding_indices)`
@ -1479,0 +1523,4 @@
int array_index = 0;
for (int i = bounding_indices[0]; i <= bounding_indices[1]; i++) {
BezTriple *bezt = &fcu->bezt[i];
Member

const BezTriple &bezt = fcu.bezt[i] is a bit nicer

`const BezTriple &bezt = fcu.bezt[i]` is a bit nicer
@ -1491,3 +1616,1 @@
* draw curve, then handle-lines, and finally vertices in this order so that
* the data will be layered correctly
*/
Array<bAnimListElem *> anim_list_elements = Array<bAnimListElem *>(fcurve_count);
Member

This can be written like

Array<bAnimListElem *> anim_list_elements(fcurve_count);
This can be written like ``` Array<bAnimListElem *> anim_list_elements(fcurve_count);
@ -1494,0 +1624,4 @@
const float2 v2d_range = {v2d->cur.xmin, v2d->cur.xmax};
Array<FCurveRenderData> render_data = Array<FCurveRenderData>(fcurve_count);
for (int i = 0; i < anim_list_elements.size(); i++) {
Member

How about for (const int i : anim_list_elements.index_range()) { :)

How about `for (const int i : anim_list_elements.index_range()) {` :)
Author
Member

Just reading this out of curiosity and left some comments. Sorry if some of them are too preemptive!

I wonder if skipping the temporary arrays and uploading the data directly to the vertex buffer would be faster?

It's a bit preemptive but I am always open to suggestions.
I was thinking I could run the data composition in parallel and then just copy to the vertex buffer after.

What I was actually hoping is that I could give the batch a new vertex buffer pointer to upload instead of copying, but that doesn't seem to be supported.

Anyway, this PR is me learning the GPUBatch thing while at the same time removing IMM from the Graph Editor drawing code

> Just reading this out of curiosity and left some comments. Sorry if some of them are too preemptive! > > I wonder if skipping the temporary arrays and uploading the data directly to the vertex buffer would be faster? It's a bit preemptive but I am always open to suggestions. I was thinking I could run the data composition in parallel and then just copy to the vertex buffer after. What I was actually hoping is that I could give the batch a new vertex buffer pointer to upload instead of copying, but that doesn't seem to be supported. Anyway, this PR is me learning the GPUBatch thing while at the same time removing IMM from the Graph Editor drawing code
Christoph Lendenfeld added 2 commits 2024-02-23 17:03:34 +01:00
Christoph Lendenfeld added 1 commit 2024-02-23 17:14:42 +01:00
Christoph Lendenfeld added 3 commits 2024-02-27 12:49:13 +01:00
Christoph Lendenfeld added 2 commits 2024-02-27 13:31:00 +01:00
Christoph Lendenfeld added 3 commits 2024-02-27 15:35:31 +01:00
Jeroen Bakker reviewed 2024-02-27 16:07:33 +01:00
Jeroen Bakker left a comment
Member

It is good to actually use index buffers and vertex buffers.
It would be even better when you can reuse them between frames and tag them when they need to be updated.
This way the data transfer can be removed when nothing special happens on the screen.

Now you're creating, building, uploading, drawing and freeing every frame. Even when nothing is changed.
imm has some optimizations that the creating and freeing isn't done and the buffers are tagged in such a way that the best memory is chosen to reduce time spent in uploading.

What we do for the overlay engine (and other editors as well) is to use a layered approach. Eg when selected keys should always be on top. Create a different set of batches only containing the selected keys. and those keys will not be present in the other batches. Render the set of batches one at a time.

Curve drawing is a very researched topic. See https://www.youtube.com/watch?v=M-IMQT7yKVI for example. I don't think this PR should aim for that, but there are other ways to look at the problem.

My approach would be to use matrix transformation and static batches. Perhaps a specific fragment shader to draw lines, outlines etc. But that would become closer to an actual overlay engine and we might want to move it then inside the draw module as there more tools exist for optimizing complex drawing code.

It is good to actually use index buffers and vertex buffers. It would be even better when you can reuse them between frames and tag them when they need to be updated. This way the data transfer can be removed when nothing special happens on the screen. Now you're creating, building, uploading, drawing and freeing every frame. Even when nothing is changed. `imm` has some optimizations that the creating and freeing isn't done and the buffers are tagged in such a way that the best memory is chosen to reduce time spent in uploading. What we do for the overlay engine (and other editors as well) is to use a layered approach. Eg when selected keys should always be on top. Create a different set of batches only containing the selected keys. and those keys will not be present in the other batches. Render the set of batches one at a time. Curve drawing is a very researched topic. See https://www.youtube.com/watch?v=M-IMQT7yKVI for example. I don't think this PR should aim for that, but there are other ways to look at the problem. My approach would be to use matrix transformation and static batches. Perhaps a specific fragment shader to draw lines, outlines etc. But that would become closer to an actual overlay engine and we might want to move it then inside the draw module as there more tools exist for optimizing complex drawing code.
Author
Member

What we do for the overlay engine (and other editors as well) is to use a layered approach. Eg when selected keys should always be on top. Create a different set of batches only containing the selected keys. and those keys will not be present in the other batches. Render the set of batches one at a time.

If I understand this right, then I need to rebuild the vertex buffer whenever anything changes. (because I need selected verts in a different buffer)
But as long as nothing changes I can just keep reusing it (probably stored on the SpaceGraph struct)

How do I deal with the optimization of only looking at keys that are within the viewport extents.
That means the vertex buffers would need updating every time the viewport moves. I think this is just not going to work when I keep the vertex buffers around.

My approach would be to use matrix transformation and static batches

That means I would upload the whole data to the GPU and just give it an extra matrix so it moves around accordingly when panning the view?

Perhaps a specific fragment shader to draw lines, outlines

I thought about that as well but wasn't sure if that's the way to go. If you say that's a good thing to do I can give it a try

> What we do for the overlay engine (and other editors as well) is to use a layered approach. Eg when selected keys should always be on top. Create a different set of batches only containing the selected keys. and those keys will not be present in the other batches. Render the set of batches one at a time. If I understand this right, then I need to rebuild the vertex buffer whenever anything changes. (because I need selected verts in a different buffer) But as long as nothing changes I can just keep reusing it (probably stored on the `SpaceGraph` struct) How do I deal with the optimization of only looking at keys that are within the viewport extents. That means the vertex buffers would need updating every time the viewport moves. I think this is just not going to work when I keep the vertex buffers around. > My approach would be to use matrix transformation and static batches That means I would upload the whole data to the GPU and just give it an extra matrix so it moves around accordingly when panning the view? > Perhaps a specific fragment shader to draw lines, outlines I thought about that as well but wasn't sure if that's the way to go. If you say that's a good thing to do I can give it a try
Member

GPU automatically removes any geometry that isn't in the viewport, so would not be my first concern. I rather have a single buffer that contains everything, compared to updating the buffer for every tweak. If a bottleneck appears, we can always make the data set smaller.

Perhaps it is better to first do some prototyping. Some of these tasks are hard to implement, or might need to be implemented in a different area for more efficiency.

Rebuilding when something changes and split batches based on usage would be a first step. You can use a single vertex buffer, but have 3 index buffers to indicate which part of the vertex buffer is selected, active or inactive.

GPU automatically removes any geometry that isn't in the viewport, so would not be my first concern. I rather have a single buffer that contains everything, compared to updating the buffer for every tweak. If a bottleneck appears, we can always make the data set smaller. Perhaps it is better to first do some prototyping. Some of these tasks are hard to implement, or might need to be implemented in a different area for more efficiency. Rebuilding when something changes and split batches based on usage would be a first step. You can use a single vertex buffer, but have 3 index buffers to indicate which part of the vertex buffer is selected, active or inactive.
Member

I think Jeroen has a great point about not re-uploading vertex buffers for panning. Using index buffers for selection and active state makes sense too.

The other intuitive thing I'd suggest is avoiding the intermediate buffers by first counting the number of points that need to be uploaded, allocating vertex buffers of exactly that size, then filling them in directly. There may be smaller portions of memory you can collect serially in the first pass to help multithread the last VBO filling pass.

I think Jeroen has a great point about not re-uploading vertex buffers for panning. Using index buffers for selection and active state makes sense too. The other intuitive thing I'd suggest is avoiding the intermediate buffers by first counting the number of points that need to be uploaded, allocating vertex buffers of exactly that size, then filling them in directly. There may be smaller portions of memory you can collect serially in the first pass to help multithread the last VBO filling pass.
Member

Other solution would be to store these buffers per fcurve in the graph editor as runtime data. Only updating the buffers that are altered. This saves unneeded cycles updating buffers that wasn't touched and uploads smaller sets of data to the GPU. This will lead to more draw calls, but that should be optimized by the GPU backends.

Our GPU module doesn't support uploading partial buffers. It could be added, but the hard part is always on the caller side keeping track of offsets and sizes, so we never added support for it.

Other solution would be to store these buffers per fcurve in the graph editor as runtime data. Only updating the buffers that are altered. This saves unneeded cycles updating buffers that wasn't touched and uploads smaller sets of data to the GPU. This will lead to more draw calls, but that should be optimized by the GPU backends. Our GPU module doesn't support uploading partial buffers. It could be added, but the hard part is always on the caller side keeping track of offsets and sizes, so we never added support for it.
Author
Member

GPU automatically removes any geometry that isn't in the viewport, so would not be my first concern. I rather have a single buffer that contains everything, compared to updating the buffer for every tweak. If a bottleneck appears, we can always make the data set smaller.

Thanks got you. That was an optimization for IMM because the fewer things we add to IMM the faster. If we keep the buffers around this no longer applies.

Perhaps it is better to first do some prototyping. Some of these tasks are hard to implement, or might need to be implemented in a different area for more efficiency.

I treat this PR as a prototype where I can try things out :)

You can use a single vertex buffer, but have 3 index buffers to indicate which part of the vertex buffer is selected, active or inactive.

Thanks that makes sense, I didn't realize you can have different index buffers for the same vertex buffer.

The other intuitive thing I'd suggest is avoiding the intermediate buffers

Yes I will try that next.

Other solution would be to store these buffers per fcurve in the graph editor as runtime data

I will also give that a try. Just need to figure out how to know which data has been modified.

@Jeroen-Bakker on the topic of curve drawing. Do you think it's feasable to move the curve drawing to the GPU completely?
We already upload the keys and handles to the GPU. Instead of calculating the bezier interpolation on the CPU and uploading a line strip it might be possible to do that all on the GPU.
The reason I am asking is because we optimized the curve drawing to be view dependent. So we'd need to rebuild it on every draw call anyway since zooming in and out changes the points

> GPU automatically removes any geometry that isn't in the viewport, so would not be my first concern. I rather have a single buffer that contains everything, compared to updating the buffer for every tweak. If a bottleneck appears, we can always make the data set smaller. Thanks got you. That was an optimization for IMM because the fewer things we add to IMM the faster. If we keep the buffers around this no longer applies. > Perhaps it is better to first do some prototyping. Some of these tasks are hard to implement, or might need to be implemented in a different area for more efficiency. I treat this PR as a prototype where I can try things out :) > You can use a single vertex buffer, but have 3 index buffers to indicate which part of the vertex buffer is selected, active or inactive. Thanks that makes sense, I didn't realize you can have different index buffers for the same vertex buffer. > The other intuitive thing I'd suggest is avoiding the intermediate buffers Yes I will try that next. > Other solution would be to store these buffers per fcurve in the graph editor as runtime data I will also give that a try. Just need to figure out how to know which data has been modified. @Jeroen-Bakker on the topic of curve drawing. Do you think it's feasable to move the curve drawing to the GPU completely? We already upload the keys and handles to the GPU. Instead of calculating the bezier interpolation on the CPU and uploading a line strip it might be possible to do that all on the GPU. The reason I am asking is because we optimized the curve drawing to be view dependent. So we'd need to rebuild it on every draw call anyway since zooming in and out changes the points
Member

GPU based interpolation is straight forward for OpenGL and vulkan. Issue might be metal. Would first start without gpu based interpolation, but keep in mind that this is a good optimization as a future step.

Thinking of it I would use compute shaders to move the interpolation to the GPU in that future goal

GPU based interpolation is straight forward for OpenGL and vulkan. Issue might be metal. Would first start without gpu based interpolation, but keep in mind that this is a good optimization as a future step. Thinking of it I would use compute shaders to move the interpolation to the GPU in that future goal
Christoph Lendenfeld added 2 commits 2024-03-07 14:49:07 +01:00
Christoph Lendenfeld added 2 commits 2024-03-08 16:40:23 +01:00
This pull request has changes conflicting with the target branch.
  • source/blender/editors/space_graph/graph_draw.cc
  • source/blender/editors/space_graph/graph_intern.hh
  • source/blender/editors/space_graph/graph_slider_ops.cc

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u ge_gpu_batch:ChrisLend-ge_gpu_batch
git checkout ChrisLend-ge_gpu_batch
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#118662
No description provided.