Sculpt Mode Performance #81180

New Issue

Pablo Dobarro · 2020-09-25T20:08:35+02:00

Pablo Dobarro commented

2020-09-25 20:08:35 +02:00

Status: Prototype implemented in D8983 with all optimizations included for Meshes, but only for some tools and features. Needs to be split into multiple patches and implement all remaining features.

Team
Commissioner: @DanielBystedt
Project leader: @PabloDobarro
Project members: -

Description
Big picture: Improve the performance of Sculpt Mode brushes and tools.

Sculpt performance demo.mov

Use cases:

Edit geometry surface detail in high poly meshes without performance issues

Design:
Remove all unnecessary calculations and updates from the PBVH when an editing operation is active, making all the Sculpt Mode code work only with the relevant vertices per brush step. The PBVH should only be used to get the affected area by the stroke as fast as possible.

Engineer plan:

Get the PBVH code ready to have much smaller leaf node sizes
Store draw buffers in intermediate nodes to reduce the size of the leaf nodes of the PBVH.
Make proxies, bounding box and normal updates optional per tool and per symmetry options.
Do fast bounding box recalculations by updating the bounding boxes when deforming and propagating them from children to parents nodes to the root of the tree.
Implement optional fast normal updates (single loop) for brushes that require them.
Replace per node undo with a per vertex undo, which will also be used to get the original data during the stroke.
Remove BMesh based dyntopo (to be replaced with a better implementation).
Remove all loops over nodes from the code that runs when a stroke is active.
Configure the optimal settings of the scheduler per tool.
Make special tools (Pose, Boundary, Elastic Deform) not use the PBVH for getting the affected area. Cache the affected area of these tools once their data is initialized.

Work plan

Milestone 1 - optional name
Optimize PBVH queries and drawing
Time estimate: 2 - 3 weeks

Milestone 2 - optional name
Rewrite task scheduling and per vertex updates.
Time estimate: 3 - 4 weeks

Relevant links:

**Status:** Prototype implemented in [D8983](https://archive.blender.org/developer/D8983) with all optimizations included for Meshes, but only for some tools and features. Needs to be split into multiple patches and implement all remaining features. --- **Team** **Commissioner:** @DanielBystedt **Project leader:** @PabloDobarro **Project members:** `-` **Description** **Big picture:** Improve the performance of Sculpt Mode brushes and tools. [Sculpt performance demo.mov](https://archive.blender.org/developer/F8920970/Sculpt_performance_demo.mov) **Use cases**: * Edit geometry surface detail in high poly meshes without performance issues **Design:** Remove all unnecessary calculations and updates from the PBVH when an editing operation is active, making all the Sculpt Mode code work only with the relevant vertices per brush step. The PBVH should only be used to get the affected area by the stroke as fast as possible. **Engineer plan:** * Get the PBVH code ready to have much smaller leaf node sizes * Store draw buffers in intermediate nodes to reduce the size of the leaf nodes of the PBVH. * Make proxies, bounding box and normal updates optional per tool and per symmetry options. * Do fast bounding box recalculations by updating the bounding boxes when deforming and propagating them from children to parents nodes to the root of the tree. * Implement optional fast normal updates (single loop) for brushes that require them. * Replace per node undo with a per vertex undo, which will also be used to get the original data during the stroke. * Remove BMesh based dyntopo (to be replaced with a better implementation). * Remove all loops over nodes from the code that runs when a stroke is active. * Configure the optimal settings of the scheduler per tool. * Make special tools (Pose, Boundary, Elastic Deform) not use the PBVH for getting the affected area. Cache the affected area of these tools once their data is initialized. **Work plan** **Milestone 1 - optional name** *Optimize PBVH queries and drawing* Time estimate: `2 - 3 weeks` **Milestone 2 - optional name** *Rewrite task scheduling and per vertex updates.* Time estimate: `3 - 4 weeks` --- **Relevant links**: * #68873 * #73934

Pablo Dobarro self-assigned this 2020-09-25 20:08:35 +02:00

Pablo Dobarro commented

2020-09-25 20:08:35 +02:00

Added subscribers: @PabloDobarro, @brecht, @Sergey, @JulienKaspar, @DanielBystedt

Julian Perez commented

2020-09-25 21:29:15 +02:00

Added subscriber: @JulianPerez

Gilberto Rodrigues commented

2020-09-25 22:52:50 +02:00

Added subscriber: @Gilberto.R

Gilberto Rodrigues commented

2020-09-25 22:52:50 +02:00

wow

Russell commented

2020-09-26 00:16:01 +02:00

Added subscriber: @Russ1642

Roberto Roch Diago commented

2020-09-26 01:26:04 +02:00

Added subscriber: @RobertoRoch

TheRedWaxPolice commented

2020-09-26 03:42:54 +02:00

Added subscriber: @TheRedWaxPolice

Vyacheslav Kobozev commented

2020-09-26 07:55:37 +02:00

Added subscriber: @Vyach

Wassili F commented

2020-09-26 11:32:47 +02:00

Added subscriber: @astroblitz

Bernhard commented

2020-09-26 11:58:16 +02:00

Added subscriber: @Ravenman13

Dejan Pejacki commented

2020-09-26 12:20:00 +02:00

Added subscriber: @dlc17

Tiago Cruz commented

2020-09-26 12:26:44 +02:00

Added subscriber: @tiagoffcruz

Przemyslaw Golab (SirPigeonz) commented

2020-09-26 13:15:36 +02:00

Added subscriber: @SirPigeonz

David commented

2020-09-26 14:17:46 +02:00

Added subscriber: @Basie

Mindinsomnia commented

2020-09-26 14:58:15 +02:00

Added subscriber: @Grady

Mindinsomnia commented

2020-09-26 14:58:15 +02:00

Wow!

That difference is HUGE!

I'm hyped about this, can't wait to see it land in Blender.

Wow! That difference is HUGE! I'm hyped about this, can't wait to see it land in Blender.

Sebastian Villanueva commented

2020-09-26 17:53:55 +02:00

Added subscriber: @vr_sebas

Pablo Dobarro commented

2020-09-26 19:31:21 +02:00

@Sergey @brecht I'm going to start making patches for master to try to get to milestone 1 as soon as possible. These are the most easy to make changes (they don't require refactor of a lot of code) and we can get quite some performance once they are done.
The idea for Milestone 1 is to reduce the size of the leaf nodes as much as possible. In order to do that, we need to solve two problems:

Move the batches to the middle of the tree (already done in the prototype)
Propagate the update flags (normals, bounding boxes, redraw, proxies...) from the leaf nodes to the root node to have faster partial updates.

We currently have a lot of places were we traverse the entire tree just to get to the leaf nodes and then we loop over the leaf nodes to check if they have a flag enabled (we do this multiple times per stroke step). When reducing the leaf nodes, this issue becomes a huge problem, even with the draw buffers in intermediate nodes as traversing the entire tree just to update the normals of a leaf node with 100 vertices is not great.
The first patches I'm going to make are going to be related to solving this issue. The performance improvement is not going to be noticeable until later (when reducing the size of the leafs), but I think this should be the first step.

@Sergey @brecht I'm going to start making patches for master to try to get to milestone 1 as soon as possible. These are the most easy to make changes (they don't require refactor of a lot of code) and we can get quite some performance once they are done. The idea for Milestone 1 is to reduce the size of the leaf nodes as much as possible. In order to do that, we need to solve two problems: - Move the batches to the middle of the tree (already done in the prototype) - Propagate the update flags (normals, bounding boxes, redraw, proxies...) from the leaf nodes to the root node to have faster partial updates. We currently have a lot of places were we traverse the entire tree just to get to the leaf nodes and then we loop over the leaf nodes to check if they have a flag enabled (we do this multiple times per stroke step). When reducing the leaf nodes, this issue becomes a huge problem, even with the draw buffers in intermediate nodes as traversing the entire tree just to update the normals of a leaf node with 100 vertices is not great. The first patches I'm going to make are going to be related to solving this issue. The performance improvement is not going to be noticeable until later (when reducing the size of the leafs), but I think this should be the first step.

WMROssi commented

2020-09-26 23:54:47 +02:00

Added subscriber: @Wesley-Rossi

Francois commented

2020-09-27 17:15:09 +02:00

Added subscriber: @FrancoisBasson

Jun Mizutani commented

2020-09-27 17:39:20 +02:00

Added subscriber: @jmztn

Joseph Eagar commented

2020-10-05 20:36:39 +02:00

Added subscriber: @JosephEagar

Joseph Eagar commented

2020-10-05 20:36:39 +02:00

For the leaf nodes, how much smaller are we talking about? The dyntopo code had a pretty significant performance degradation from ->leaf_limit being left at 100. Like Pablo said, this is due to traversing the entire tree in order to get anything done (there are other performance degradations tied to multiple per-vertex uses of GHash, but leaf_limit being 100 was pretty significant in itself).

BTW, remember that splitting your memory into small chunks can be death to cache coherency. Here are a few relevant papers I found with a quick Google search:

http://datamove.imag.fr/bruno.raffin/papers/ID/tvcg10.pdf

For the leaf nodes, how much smaller are we talking about? The dyntopo code had a pretty significant performance degradation from ->leaf_limit being left at 100. Like Pablo said, this is due to traversing the entire tree in order to get anything done (there are other performance degradations tied to multiple per-vertex uses of GHash, but leaf_limit being 100 was pretty significant in itself). BTW, remember that splitting your memory into small chunks can be death to cache coherency. Here are a few relevant papers I found with a quick Google search: http://datamove.imag.fr/bruno.raffin/papers/ID/tvcg10.pdf

Joseph Eagar commented

2020-10-05 20:40:05 +02:00

I guess 'enter' submits comments if you're not careful. Anyway, here are the other papers:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.6906&rep=rep1&type=pdf
https://dcgi.fel.cvut.cz/home/havran/ARTICLES/cgf2011.pdf

I guess 'enter' submits comments if you're not careful. Anyway, here are the other papers: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.6906&rep=rep1&type=pdf https://dcgi.fel.cvut.cz/home/havran/ARTICLES/cgf2011.pdf

Pablo Dobarro commented

2020-10-05 20:58:33 +02:00

@JosephEagar Hi! That is what I'm working on in this patch. https://developer.blender.org/D9029
We still don't know how small we can make the leaf nodes, but in order to do experiments with that (and fix some of the current performance issues), we first need to remove from the code all functions that traverse the entire tree to find a leaf node and do the updates (and there are a lot of those), and be able to store the draw buffers at any level in the tree.

For this project I was only considering meshes and Multires, and I was focusing more on having fast strokes with small radius (see https://developer.blender.org/D8983#221492) instead of making brushes that displace a big number of vertices faster. I would say that we want leaf nodes to be as small as possible to be able to discard as many vertices and have more localized updates, but we need to find a good balance.

@JosephEagar Hi! That is what I'm working on in this patch. https://developer.blender.org/D9029 We still don't know how small we can make the leaf nodes, but in order to do experiments with that (and fix some of the current performance issues), we first need to remove from the code all functions that traverse the entire tree to find a leaf node and do the updates (and there are a lot of those), and be able to store the draw buffers at any level in the tree. For this project I was only considering meshes and Multires, and I was focusing more on having fast strokes with small radius (see https://developer.blender.org/D8983#221492) instead of making brushes that displace a big number of vertices faster. I would say that we want leaf nodes to be as small as possible to be able to discard as many vertices and have more localized updates, but we need to find a good balance.

kouza commented

2020-10-22 16:16:50 +02:00

Added subscriber: @kouzanagi