Highpoly mesh sculpting performance #68873
Open
opened 2019-08-20 17:03:21 +02:00 by Pablo Dobarro
·
31 comments
No Branch/Tag Specified
main
blender-v3.6-release
tmp-usd-python-mtl
temp-sculpt-dyntopo
asset-browser-frontend-split
temp-sculpt-dyntopo-hive-alloc
node-group-operators
brush-assets-project
asset-shelf
blender-v2.93-release
blender-v3.3-release
universal-scene-description
temp-sculpt-attr-api
blender-v3.5-release
realtime-clock
sculpt-dev
gpencil-next
bevelv2
microfacet_hair
blender-projects-basics
principled-v2
v3.3.7
v2.93.18
v3.5.1
v3.3.6
v2.93.17
v3.5.0
v2.93.16
v3.3.5
v3.3.4
v2.93.15
v2.93.14
v3.3.3
v2.93.13
v2.93.12
v3.4.1
v3.3.2
v3.4.0
v3.3.1
v2.93.11
v3.3.0
v3.2.2
v2.93.10
v3.2.1
v3.2.0
v2.83.20
v2.93.9
v3.1.2
v3.1.1
v3.1.0
v2.83.19
v2.93.8
v3.0.1
v2.93.7
v3.0.0
v2.93.6
v2.93.5
v2.83.18
v2.93.4
v2.93.3
v2.83.17
v2.93.2
v2.93.1
v2.83.16
v2.93.0
v2.83.15
v2.83.14
v2.83.13
v2.92.0
v2.83.12
v2.91.2
v2.83.10
v2.91.0
v2.83.9
v2.83.8
v2.83.7
v2.90.1
v2.83.6.1
v2.83.6
v2.90.0
v2.83.5
v2.83.4
v2.83.3
v2.83.2
v2.83.1
v2.83
v2.82a
v2.82
v2.81a
v2.81
v2.80
v2.80-rc3
v2.80-rc2
v2.80-rc1
v2.79b
v2.79a
v2.79
v2.79-rc2
v2.79-rc1
v2.78c
v2.78b
v2.78a
v2.78
v2.78-rc2
v2.78-rc1
v2.77a
v2.77
v2.77-rc2
v2.77-rc1
v2.76b
v2.76a
v2.76
v2.76-rc3
v2.76-rc2
v2.76-rc1
v2.75a
v2.75
v2.75-rc2
v2.75-rc1
v2.74
v2.74-rc4
v2.74-rc3
v2.74-rc2
v2.74-rc1
v2.73a
v2.73
v2.73-rc1
v2.72b
2.72b
v2.72a
v2.72
v2.72-rc1
v2.71
v2.71-rc2
v2.71-rc1
v2.70a
v2.70
v2.70-rc2
v2.70-rc
v2.69
v2.68a
v2.68
v2.67b
v2.67a
v2.67
v2.66a
v2.66
v2.65a
v2.65
v2.64a
v2.64
v2.63a
v2.63
v2.61
v2.60a
v2.60
v2.59
v2.58a
v2.58
v2.57b
v2.57a
v2.57
v2.56a
v2.56
v2.55
v2.54
v2.53
v2.52
v2.51
v2.50
v2.49b
v2.49a
v2.49
v2.48a
v2.48
v2.47
v2.46
v2.45
v2.44
v2.43
v2.42a
v2.42
v2.41
v2.40
v2.37a
v2.37
v2.36
v2.35a
v2.35
v2.34
v2.33a
v2.33
v2.32
v2.31a
v2.31
v2.30
v2.28c
v2.28a
v2.28
v2.27
v2.26
v2.25
Labels
Clear labels
Issues relating to security: https://wiki.blender.org/wiki/Process/Vulnerability_Reports
Apply labels
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
Eevee & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest/Import
Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest: Wayland
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Issues relating to security: https://wiki.blender.org/wiki/Process/Vulnerability_Reports
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Eevee & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
Eevee & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest/Import
Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest: Wayland
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Eevee & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
Milestone
Set milestone
Clear milestone
No items
No Milestone
Projects
Set Project
Clear projects
No project
Assignees
Assign users
Clear assignees
No Assignees
27 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#68873
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
Updating smooth normals
BKE_pbvh_update_normals
takes up 50% of of time in some cases. The use of atomics inpbvh_update_normals_accum_task_cb
to add normals from faces to vertices is problematic. Ideally those should be avoided entirely but it's not simple to do so. Possibilities:Smoothing tools may also be able to benefit from this, to work without the overhead of storing adjacency info of the entire mesh.
Depending if the brush tool needs normals, we could delay updating normals outside of the viewport.
Coherent memory access
Each PBVH node contains a subset of the mesh vertices and faces. These are not contiguous so iterating over all vertices in node leads to incoherent memory access.
Two things we can do here:
For multires this is less of an issue since all vertices within a grids are in one block, though it might help a little bit to not allocate every grid individually and instead have one allocation per node.
Partial redraw
Draw buffers
GPU_vertbuf_raw_step
to reduce overhead of creating buffers ingpu_buffers.c
(only used in a few places now)Masks
Tagging PBVH nodes as fully masked would let us skip iterating over their vertices for sculpt tools. Drawing code could also avoid storing a buffer in this case, though the overhead of allocating/freeing that often may not be worth it.
Tagging PBVH nodes as fully unmasked would let us quickly skip drawing them as part of the overlay.
Mask are currently draw in a separate pass as part of the overlays. It would be more efficient to draw then along with the original faces, so we can draw faces just once.
Consolidate vertex loops
There are various operations that loop over all vertices or faces. The sculpt brush operation, merging results for symmetry, bounding box updates, normal updates, draw buffer updates, etc.
Some of these may be possible to merge together, to reduce the overhead of threading any the cost of memory access and cache misses.
Bounding box frustum tests
Sculpt tools that take into account the frustum only use 4 clipping planes, we should add another plane to clip nodes behind the camera. But unlike drawing, don't do use clip end and always have clip start equal to 0.
Frustum - AABB intersection tests do not appear to be a bottleneck currently. But some possible optimizations here:
Threading
BLI_parallel_range_settings_defaults
are still optimal. Maybe the node limit can be removed, chunk size code be reduced or increased, or scheduling could be dynamic instead of static.Changed now to remove node limit and use dynamic scheduling with chunk size 1, gave about a 10% performance improvement. For a high number of nodes it may be worth increasing the chunk size.
Symmetry
For X symmetry we currently do 2 loops over all vertices, and then do another loop to merge them. These 3 could perhaps be merged into one loop, though code might become significantly more complicated as every brush tool may need to code to handle symmetry.
Low level optimizations
Overall, this kind of optimization requires carefully analyzing code that runs per mesh element, and trying to make it faster.
Sculpt tools support many settings, and the number of functions calls, conditionals and following of pointers adds up. It can be worth testing what happens when most of the code is removed, what kind of overhead there is.
It can help to copy some commonly used variables onto the stack functions, ensuring that they can stay in registers and avoiding pointer aliasing. Test that check multiple variables could be precomputed and the result stored in a bitflag.
More functions can be inlined in some cases. For example bmesh iterators used for dyntopo go through function pointers and function calls, while they really can be a simple double loop over chunks and the elements within the chunks.
PBVH building
Building the PBVH is not the most performance critical since it only happens when entering sculpt mode, but there is room for optimization anyway. The most obvious one is multithreading.
Brush radius bounds
Culling of nodes outside the brush radius is disabled for 2D Falloff:
Elastic Deform has no bounds, but it may be possible to compute some even if they are bigger than the brush radius.
Memory allocations for all vertices
Some sculpt tools allocate arrays the size of all vertices for temporary data. For operations that are local, it would be better to allocate arrays per PBVH node when possible.
In some cases this might make little difference, virtual memory pages may be mapped on demand until there are actual reads/writes (though this is not obviously guaranteed for all allocators and operating systems?).
Also regarding coherent memory access, this could improve performance, if vertices are grouped per node as described above.
Undo
Undo pushes all nodes that are whose bounding boxes are within the brush radius. However that doesn't mean any vertices in that node are actually affected by the brush. In a simple test painting on a sphere, it pushed e.g. 18 nodes but only actually modified 7.
We can reduce undo memory by delaying the undo push until we know any vertices within the node are about to be modified, though this may have a small performance impact. Ideally this would take into account both the brush radius test and masking/textures.
Similarly, we also sometimes call
BKE_pbvh_node_mark_redraw
orBKE_pbvh_node_mark_normals_update
for nodes without checking if any vertices within have actually been modified.Added subscriber: @PabloDobarro
Added subscriber: @brecht
The point of the PBVH is to be able to do partial updates quickly. If doing many partial updates is someone significantly slower than updating the mesh as a whole, that is something to be fixed. There is no good reason for it to be slower.
The solution should not be to take some separate code path that updates the mesh as a whole, but rather fixing the bottleneck in the partial updates.
Added subscriber: @ErickNyanduKabongo
Added subscriber: @item412
Added subscriber: @CMC
Added subscriber: @tiagoffcruz
This issue was referenced by
c931a0057f
Added subscriber: @ReguzaEi
Some profiles from a 3 million poly mesh after the latest optimizations.
Running single threaded with
-t 1
. The multithreaded one is not as readable as a screenshot, but the hotspots are similar.Large draw brush. Bottleneck is mainly the sculpting itself, with symmetry here.
Mesh filter. Clearly normal update is the problem here. Not using atomics there make it 2x faster overall, but also can give wrong results then.

The impact incoherent memory access is not possible to see in profiles like this, but it's probably worth trying to hack together some code for that and evaluate how much it helps, and then see if it's worth implementing properly.
Added subscriber: @AlbertoVelazquez
Added subscriber: @Josephbburg
Added subscriber: @s12a
Added subscriber: @PawelLyczkowski-1
Added subscriber: @ClinToch
Added subscriber: @TomMusgrove
@PabloDobarro - another performance suggestion for sculpt/paint is to maintain a lower resolution version of what you are working on that is updated and rendered immediately as the stroke occurs; then the stroke is applied to the higher resoltution version of the mesh/image in seperate threads and they are rendered and replace the low res rendering as they are completed. This can reduce the amount of mesh and image data kept in memory or allow meshes/images that would greatly exceed memory; and allow compression of the parts of the mesh/image not in use.
The lower res object and image data can use about 1/4 to 1/8 the memory of the full object and images (or even drastically less for large images that are zoomed out); and then only the chunks of mesh data and image data that are actively being changed need to be kept in memory. Which chunks are needed are fairly predictible based on stroke direction, so loading and unloading them shouldn't introduce lag.
Added subscriber: @ArtemBataev
Added subscriber: @SamGreen
Added subscriber: @FedericoExposito
Added subscriber: @DirSurya
Added subscriber: @pauanyu_blender
Removed subscriber: @DirSurya
Added subscriber: @Skleembof
Added subscriber: @ZackMercury-2
Added subscriber: @Francis_J
Added subscriber: @HaroldRiverolEchemendia
Added subscriber: @Wesley-Rossi
Added subscriber: @Canucklesandwich
Added subscriber: @DARRINALDER
Added subscriber: @E.Meurat