GPU: Mesh Drawing Performance #87835
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
21 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#87835
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Blender performance when working with huge meshes can be improved.
Here are some ideas and reasoning.
Research Topics
Technical tasks
Move the display normals to draw module:
After reevaluating the modifier stack the display normals are updated (depsgraph update). drawing code could take better decisions when doing it as part of the draw module. To calculate the display normals a reverse lookup structure is build. This structure isn't kept around. Performance could be improved when geometry doesn't change between recalc.
Other buffers can also use this data (adjacency IBO for example). Does cycles use the display normals? If not we could eliminate it from the DNA/RNA
Use data streaming optimized data structures in MeshRenderData (do not lookup polies inside a loop). Reduce cache misses by storing data in arrays and only allow sequential access.
normals are precalculated, but uses additional memory that can lead to less performance (L2 caches) Check if calculating in inner loop speeds up.
Split edit mode/object mode cache: currently the edit mode cache or object mode cache reuses the same memory location. When constructing the VBO/IBO the logic branches of. Would the code quality improve and also the innerloops. Expected tiny speedup. less branching between Mesh and Bmesh evaluation.
Migrate to CPP and reduce branching by using template functions and classes.
Can we use compute shaders to convert the MeshRenderData. This way we don't need to upload all the data from ram -> GPU.
Need to research about the data transfer before and after such a change. The hair IBO is actually a simple formula. No need to do it on CPU.
Added subscribers: @Jeroen-Bakker, @ideasman42
Added subscriber: @JacobMerrill-1
Added subscriber: @TheRedWaxPolice
Added subscriber: @elias.andersson92
Added subscriber: @JorgeBernalMartinez
Added subscriber: @kioku
Added subscriber: @GeorgiaPacific
Added subscriber: @fclem
From my own tests, uploading data to the GPU is currently the main bottleneck when transforming geometry, see: #88021.
This seems worth doing early on, it should make code more maintainable.
Partial updates might be worth exploring:
The data layout could be optimized, I recall @fclem mentioning we could avoid uploading vertex coordinates multiple times for e.g.
Added subscriber: @warcanin
Added subscriber: @machieb
Hello, I don´t know if this is the right place for my statement but blender is massivly slow when trying to select an object in the viewport.
The more objects are in the scene and the more poligons they have it gets slower and slower. Our typical scenes have >10000 objects and >20million polygons. That is not extreme.
We use a very old 3D-program that was not updated since 2012 to select and shade all our objects in the scene, because in Blender it is not possible. In Blender you wait 5-10sec after clicking on an object in the viewport until it gets selected.
In the other packages selection is instantly. This Program can handle more than 100million polygons and instant selection with ease.
I hope this behavior could be solved when you now try to improve the mesh drawing performance!
Thanks
This comment was removed by @Jeroen-Bakker
Transforming verts.
When going to edit mode and transforming a single vert the next batches are recalculated:
ibo.tris
sort the triangles by material. by looping twice once to count and the second time to assign. will not add hidden faces. implementation is single threadedibo.points
single loop will not add hidden verts. implementation is single threaded.vbo.edit_data
updates flag (vert, edge, crease and weight.Unchanged buffers are:
assuming high poly count
num_tris * 3 *sizeof(int)
+num_vert * sizeof(int)
+num_vert * sizeof(int)
are recalculated and resend.vbo.edit_data
uses threading.Callgraph when run single threaded

Added subscriber: @easythrees
Added subscriber: @ArtemBataev
Hi there, random question. You mention in the description that "To calculate the display normals a reverse lookup structure is build. This structure isn't kept around. Performance could be improved when geometry doesn't change between recalc", where is this exactly?
Added subscriber: @rjg
Added subscriber: @JosephEagar
For selection buffers, we in the sculpt module are planning to make bmesh PBVH available as a general-purpose API (separate from sculpt mode and DynTopo). It could potentially replace gpu selection picking entirely while providing a pretty significant boost to drawing performance (PBVH stores drawing buffers in the leaves which can be selectively updated, so you're not sending an entire mesh to the GPU just because you moved a single vertex).
Btw if you don't want to use PBVH for drawing, pretty much any mesh segmentation method will work. You can assign triangles to drawing buffers randomly or even in indexed order and it'll still be a lot faster. The idea is to associate mesh elements with segments so you only update the segments that have changed.
Added subscriber: @RedMser
Added subscriber: @PetterLundh
Added subscriber: @ckohl_art
has any thought been given to occlusion culling or LOD/adaptive tesselation?
uploading geometry to the gpu is expensive - but so is drawing all of it even when it's contributing much if anything.
other idea
render screen smaller / use AMD super resolution now that it's open source?
LODs needs to be preprocessed and would not work with gpu selection.
Even if supported by opengl 3.0 how would adaptive tesselation work with complex object like we support with blender. Preprocessing this would be a CPU job and would slow down in the areas where the user is working at.
As you already say uploading geometry is the issue, not rendering the uploaded geometry, FSR would only help in the latter part.
I have an idea to experiment with FSR, but that is to increase the details in icon/button preview rendering.
Added subscriber: @Grady
What about distance and occlusion culling for the viewport?
Sometimes very dense mesh objects are drawn when they don't need to be. Such as a 100k polygon diamond ring, sitting on the finger of a human character, roughly 1km away from the camera.
Or a finely detailed piano sitting in the next room over behind a wall (as part of a sweeping camera move where it will eventually come into view but hasn't yet).
Distance Culling
A viewport optimisation setting available to users to cull any object if it's distance and bounding sphere diameter would result in the object not covering more than Y pixels number of pixels. A default of 2 for example could be enough to cull anything that would be virtually invisible anyway. Users with more aggressive culling needs could bump that number as high as they need.
Occlusion Culling
Occlusion culling could be implemented with the technique "Hierarchical-Z map based occlusion culling".
For reference, description of the approach here: https://rastergrid.com/blog/2010/10/hierarchical-z-map-based-occlusion-culling/
It has the benefit of requiring very little preprocessing or additional work, it's GPU based, it can in fact be almost automatic with the right approach, or manually tweaked for even better performance.
In short, just render a depth only, low resolution pass of only the 'large' objects in the scene, use that to form a tile map of the 'minimum Z value' and use that to determine occlusion of objects before rendering them.
Or alternatively, allow users to manually specify which objects are occluders, so users can create low poly shapes to represent the silhouette of some highly detailed objects, or a low poly shape of for example, the wall structure of a large archviz scene, culling out drawing most of the objects in the scene that aren't directly visible.
This could also be used to cull things other than objects, such as lights in Eevee.
+1 for occlusion culling
about the LOD -
'weld vertex' + round vertex to grid, but the grid size changes based on distance from camera - seems to be a really fast LOD even on the CPU
this solution mostly applies to things like terrain
LOD_Test.blend
Added subscriber: @EAW
Added subscriber: @SpencerMagnusson
Added subscriber: @Yuro