WIP: Rewrite Graph Editor drawing code #118662
No reviewers
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#118662
Loading…
Reference in New Issue
No description provided.
Delete Branch "ChrisLend/blender:ge_gpu_batch"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Not meant for merging. This is an exploration of rewriting the GE drawing code to make it scale better.
Issue
The issue I am trying to solve is the massive slowdown of the Graph Editor when displaying large amounts of keys and handles.
Profiling showed this is mainly a CPU bottleneck when sifting through all
FCurves
and adding those to the IMM buffer.Solution of this PR
Run the processes that prepare data for the GPU in parallel, then upload later.
The root function is
graph_draw_curves
.Within that I run a
threading::parallel_for
over each of theFCurves
to generate the data needed to feed the GPU.After all that data is passed to
draw_fcurve_render_data
which handles the actual drawing.Questions
Currently the code in the PR copies data twice, once in the build, then later into the GPU vertex array.
Is there a way to remove the second copy? The data is already exactly in the format that the GPU wants, so I thought we could just give it a pointer and upload directly?
Performance
TODO
Known Issues
graph_draw_driver_debug
is never calledTest file
https://drive.google.com/file/d/11wQhuYmhCtQgjT4uJWORTwKlWit1irp2/view?usp=sharing
Imported and extended from: https://sites.google.com/a/cgspeed.com/cgspeed/motion-capture/the-motionbuilder-friendly-bvh-conversion-release-of-cmus-motion-capture-database
Just reading this out of curiosity and left some comments. Sorry if some of them are too preemptive!
I wonder if skipping the temporary arrays and uploading the data directly to the vertex buffer would be faster?
@ -1479,0 +1516,4 @@
ColorTheme4b color_sel = get_fcurve_vertex_color(fcu, true);
ColorTheme4b color_desel = get_fcurve_vertex_color(fcu, false);
const blender::int2 bounding_indices = get_bounding_bezt_indices(fcu, range[0], range[1]);
Might be nicer if this returned an
IndexRange
, which could then be iterated likefor (const int i : bounding_indices)
@ -1479,0 +1523,4 @@
int array_index = 0;
for (int i = bounding_indices[0]; i <= bounding_indices[1]; i++) {
BezTriple *bezt = &fcu->bezt[i];
const BezTriple &bezt = fcu.bezt[i]
is a bit nicer@ -1491,3 +1616,1 @@
* draw curve, then handle-lines, and finally vertices in this order so that
* the data will be layered correctly
*/
Array<bAnimListElem *> anim_list_elements = Array<bAnimListElem *>(fcurve_count);
This can be written like
@ -1494,0 +1624,4 @@
const float2 v2d_range = {v2d->cur.xmin, v2d->cur.xmax};
Array<FCurveRenderData> render_data = Array<FCurveRenderData>(fcurve_count);
for (int i = 0; i < anim_list_elements.size(); i++) {
How about
for (const int i : anim_list_elements.index_range()) {
:)It's a bit preemptive but I am always open to suggestions.
I was thinking I could run the data composition in parallel and then just copy to the vertex buffer after.
What I was actually hoping is that I could give the batch a new vertex buffer pointer to upload instead of copying, but that doesn't seem to be supported.
Anyway, this PR is me learning the GPUBatch thing while at the same time removing IMM from the Graph Editor drawing code
It is good to actually use index buffers and vertex buffers.
It would be even better when you can reuse them between frames and tag them when they need to be updated.
This way the data transfer can be removed when nothing special happens on the screen.
Now you're creating, building, uploading, drawing and freeing every frame. Even when nothing is changed.
imm
has some optimizations that the creating and freeing isn't done and the buffers are tagged in such a way that the best memory is chosen to reduce time spent in uploading.What we do for the overlay engine (and other editors as well) is to use a layered approach. Eg when selected keys should always be on top. Create a different set of batches only containing the selected keys. and those keys will not be present in the other batches. Render the set of batches one at a time.
Curve drawing is a very researched topic. See https://www.youtube.com/watch?v=M-IMQT7yKVI for example. I don't think this PR should aim for that, but there are other ways to look at the problem.
My approach would be to use matrix transformation and static batches. Perhaps a specific fragment shader to draw lines, outlines etc. But that would become closer to an actual overlay engine and we might want to move it then inside the draw module as there more tools exist for optimizing complex drawing code.
If I understand this right, then I need to rebuild the vertex buffer whenever anything changes. (because I need selected verts in a different buffer)
But as long as nothing changes I can just keep reusing it (probably stored on the
SpaceGraph
struct)How do I deal with the optimization of only looking at keys that are within the viewport extents.
That means the vertex buffers would need updating every time the viewport moves. I think this is just not going to work when I keep the vertex buffers around.
That means I would upload the whole data to the GPU and just give it an extra matrix so it moves around accordingly when panning the view?
I thought about that as well but wasn't sure if that's the way to go. If you say that's a good thing to do I can give it a try
GPU automatically removes any geometry that isn't in the viewport, so would not be my first concern. I rather have a single buffer that contains everything, compared to updating the buffer for every tweak. If a bottleneck appears, we can always make the data set smaller.
Perhaps it is better to first do some prototyping. Some of these tasks are hard to implement, or might need to be implemented in a different area for more efficiency.
Rebuilding when something changes and split batches based on usage would be a first step. You can use a single vertex buffer, but have 3 index buffers to indicate which part of the vertex buffer is selected, active or inactive.
I think Jeroen has a great point about not re-uploading vertex buffers for panning. Using index buffers for selection and active state makes sense too.
The other intuitive thing I'd suggest is avoiding the intermediate buffers by first counting the number of points that need to be uploaded, allocating vertex buffers of exactly that size, then filling them in directly. There may be smaller portions of memory you can collect serially in the first pass to help multithread the last VBO filling pass.
Other solution would be to store these buffers per fcurve in the graph editor as runtime data. Only updating the buffers that are altered. This saves unneeded cycles updating buffers that wasn't touched and uploads smaller sets of data to the GPU. This will lead to more draw calls, but that should be optimized by the GPU backends.
Our GPU module doesn't support uploading partial buffers. It could be added, but the hard part is always on the caller side keeping track of offsets and sizes, so we never added support for it.
Thanks got you. That was an optimization for IMM because the fewer things we add to IMM the faster. If we keep the buffers around this no longer applies.
I treat this PR as a prototype where I can try things out :)
Thanks that makes sense, I didn't realize you can have different index buffers for the same vertex buffer.
Yes I will try that next.
I will also give that a try. Just need to figure out how to know which data has been modified.
@Jeroen-Bakker on the topic of curve drawing. Do you think it's feasable to move the curve drawing to the GPU completely?
We already upload the keys and handles to the GPU. Instead of calculating the bezier interpolation on the CPU and uploading a line strip it might be possible to do that all on the GPU.
The reason I am asking is because we optimized the curve drawing to be view dependent. So we'd need to rebuild it on every draw call anyway since zooming in and out changes the points
GPU based interpolation is straight forward for OpenGL and vulkan. Issue might be metal. Would first start without gpu based interpolation, but keep in mind that this is a good optimization as a future step.
Thinking of it I would use compute shaders to move the interpolation to the GPU in that future goal
Checkout
From your project repository, check out a new branch and test the changes.