Refactoring: Geometry Nodes: Rewrite Ico Sphere mesh primitive #116729

Open
Iliya Katushenock wants to merge 82 commits from mod_moder/blender:ico_sphere into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.

Abstract

This refactoring is part of implicit project of removing BMesh from Geometry Nodes.
Other such commits: e83f46ea76, b44406f963, #112264.
Main goal is to avoid overhead for all listed case:

  1. Mesh <-> BMesh conversion.
  2. Unnecessary BMesh topology mapping.
  3. A lot of small memory allocation as part of BMesh implementation.
  4. Subdivision operations as part of BMesh design.
  5. Single thread BMesh operations.

This Pull Request change Ico Sphere mesh primitive node to make result mesh from scratch in simple way.
This refactoring totaly delete old BMesh version from this file. New version is achieved impressive speed and
memory usage impact by generation mesh as single-time allocation final buffers of topology data and applying index math.
Everything works with linear complexity which is depends on amount of final mesh components (vertices, edges, faces, corners).
Now generation of Ico Sphere can be done much faster for final user in largers resolution, 100x~.

Benchmark

Bellow listed timings to compare old and new implementations (ms):

Kind \ Resolution 1 2 3 4 5 6 7 8 9 10 11
BMesh 0.18 0.29 0.68 1.78 5.61 20.89 87.94 384.46 2'227.26 14'300 -
Bmesh + UV 0.18 0.30 0.75 1.80 5.73 20.93 87.93 386.86 2'222.72 14'300 -
Mesh (single thread) 0.0348 0.0236 0.0280 0.15 0.30 0.76 2.08 6.01 21.87 91.53 465.59
Mesh + UV (single thread) 0.0384 0.0275 0.0372 0.19 0.38 0.95 2.77 9.18 41.71 188.31 766.56
Mesh (mutithreading) 0.0343 0.0249 0.339 0.34 0.23 0.84 2.05 4.45 14.80 65.35 301.89
Mesh + UV (mutithreading) 0.0398 0.0315 0.0350 0.19 0.36 0.82 2.41 5.89 27.86 123.57 503.94
Kind \ Resolution 1 2 3 4 5 6 7 8 9 10
BMesh / Mesh (single thread) 5.172x 12.28x 24.28x 11.86x 18.7x 27.48x 42.27x 63.97x 101.84x 156.23x
Bmesh + UV / Mesh + UV (single thread) 4.68x 10.90x 20.16x 9.47x 15.07x 22.03x 31.74x 42.14x 53.28x 75.93x
BMesh / Mesh (mutithreading) 5.24x 11.64x 2.00x 5.23x 24.39x 24.86x 42.89x 86.39x 150.49x 218.82x
Bmesh + UV / Mesh + UV (mutithreading) 4.52x 9.52x 21.42x 9.47x 15.91x 25.52x 36.48x 65.68x 79.78x 115.7x

UV is optional function and its impact is measured separately.
Previously Ico Sphere resolution was limited by 10 internally. New mesh version is limited by 12 (due to not enough memory to hold result). BMesh results for 11 resolution is not checked due to not enough memory.
Not so many threads is used there, main limitation there is memory bandwidth.

New algorithm produce result Mesh in single pass, and resolution is not limited by power of 2. Right now resolution is still
treated as power of 2, but now this not real limitation and this can be changed in future.

New algorithm produce mesh looks the same as older. The same positions of vertices, topology and uv map. Internal indices
is different though, so this might cause changed result for node tree setups which is depend on indices of mesh primitive.

Writing of all data is sequential (not randomly distributed segments).
Multithreading might introduce some randomization, but that is not matter on such level.

Possible improvements for future:

  1. Reimplement interpolation functions to be more template, to avoid using of mix2 and improve CPU cach (here is one func just to interpolate 6 floats in one array by chunks...).
  2. Explore ability to use float/double iterators instead of interpolation in context of this node and maybe in others.
  3. Try to reuse / share topology on some small primitives.
    3.1. Implicit sharing for all data of result ico sphere with one weak user?
  4. Code might looks like just pretty simple subdivision of triangulated mesh. Can this part be extracted as separate node?
  5. A lot of IndexRange's is used there as way to hide math operations over offsets and protect everything by assertions for start and size. But this also introduce a lot of int64_t math in geometry code which is deal with int's usually...
  6. Check memory_bandwidth_bound_task for each face.
## Abstract This refactoring is part of implicit project of removing BMesh from Geometry Nodes. Other such commits: e83f46ea7630163ae836f04cff867eee99032efa, b44406f9634a35ebe0b4fd334715d63e75af08a0, #112264. Main goal is to avoid overhead for all listed case: 1. Mesh <-> BMesh conversion. 2. Unnecessary BMesh topology mapping. 3. A lot of small memory allocation as part of BMesh implementation. 4. Subdivision operations as part of BMesh design. 5. Single thread BMesh operations. This Pull Request change `Ico Sphere` mesh primitive node to make result mesh from scratch in simple way. This refactoring totaly delete old BMesh version from this file. New version is achieved impressive speed and memory usage impact by generation mesh as single-time allocation final buffers of topology data and applying index math. Everything works with linear complexity which is depends on amount of final mesh components (vertices, edges, faces, corners). Now generation of Ico Sphere can be done much faster for final user in largers resolution, 100x~. ## Benchmark Bellow listed timings to compare old and new implementations (ms): | Kind \ Resolution | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | | BMesh | 0.18 | 0.29 | 0.68 | 1.78 | 5.61 | 20.89 | 87.94 | 384.46 | 2'227.26 | 14'300 | - | | Bmesh + UV | 0.18 | 0.30 | 0.75 | 1.80 | 5.73 | 20.93 | 87.93 | 386.86 | 2'222.72 | 14'300 | - | | Mesh (single thread) | 0.0348 | 0.0236 | 0.0280 | 0.15 | 0.30 | 0.76 | 2.08 | 6.01 | 21.87 | 91.53 | 465.59 | | Mesh + UV (single thread) | 0.0384 | 0.0275 | 0.0372 | 0.19 | 0.38 | 0.95 | 2.77 | 9.18 | 41.71 | 188.31 | 766.56 | | Mesh (mutithreading) | 0.0343 | 0.0249 | 0.339 | 0.34 | 0.23 | 0.84 | 2.05 | 4.45 | 14.80 | 65.35 | 301.89 | | Mesh + UV (mutithreading) | 0.0398 | 0.0315 | 0.0350 | 0.19 | 0.36 | 0.82 | 2.41 | 5.89 | 27.86 | 123.57 | 503.94 | | Kind \ Resolution | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | | BMesh / Mesh (single thread) | 5.172x | 12.28x | 24.28x | 11.86x | 18.7x | 27.48x | 42.27x | 63.97x | 101.84x | 156.23x | | Bmesh + UV / Mesh + UV (single thread) | 4.68x | 10.90x | 20.16x | 9.47x | 15.07x | 22.03x | 31.74x | 42.14x | 53.28x | 75.93x | | BMesh / Mesh (mutithreading) | 5.24x | 11.64x | 2.00x | 5.23x | 24.39x | 24.86x | 42.89x | 86.39x | 150.49x | 218.82x | | Bmesh + UV / Mesh + UV (mutithreading) | 4.52x | 9.52x | 21.42x | 9.47x | 15.91x | 25.52x | 36.48x | 65.68x | 79.78x | 115.7x | _UV is optional function and its impact is measured separately._ Previously Ico Sphere resolution was limited by 10 internally. New mesh version is limited by 12 (due to not enough memory to hold result). BMesh results for 11 resolution is not checked due to not enough memory. Not so many threads is used there, main limitation there is memory bandwidth. New algorithm produce result Mesh in single pass, and resolution is not limited by power of 2. Right now resolution is still treated as power of 2, but now this not real limitation and this can be changed in future. New algorithm produce mesh looks the same as older. The same positions of vertices, topology and uv map. Internal indices is different though, so this might cause changed result for node tree setups which is depend on indices of mesh primitive. Writing of all data is sequential (not randomly distributed segments). Multithreading might introduce some randomization, but that is not matter on such level. ## Possible improvements for future: 1. Reimplement interpolation functions to be more template, to avoid using of `mix2` and improve CPU cach (here is one func just to interpolate 6 floats in one array by chunks...). 2. Explore ability to use float/double iterators instead of interpolation in context of this node and maybe in others. 3. Try to reuse / share topology on some small primitives. 3.1. Implicit sharing for all data of result ico sphere with one weak user? 4. Code might looks like just pretty simple subdivision of triangulated mesh. Can this part be extracted as separate node? 5. A lot of IndexRange's is used there as way to hide math operations over offsets and protect everything by assertions for start and size. But this also introduce a lot of int64_t math in geometry code which is deal with int's usually... 6. Check `memory_bandwidth_bound_task` for each face.
Iliya Katushenock added the
Interest
Geometry Nodes
label 2024-01-03 00:43:38 +01:00
Iliya Katushenock added 9 commits 2024-01-03 00:43:46 +01:00
First-time contributor

Hello if I may, you use #include while the new c++ standard uses import std;
So I think you should use import std;

Hello if I may, you use #include <iostream> while the new c++ standard uses import std; So I think you should use import std;
Author
Member

Hello if I may, you use #include while the new c++ standard uses import std;
So I think you should use import std;

Next few years blender going to keep on Cpp17 standart (due to vfx platform).

> Hello if I may, you use #include while the new c++ standard uses import std; So I think you should use import std; Next few years blender going to keep on Cpp17 standart (due to vfx platform).
First-time contributor

ok sorry I thought it was interesting to stick to the latest standard it's for this.

ok sorry I thought it was interesting to stick to the latest standard it's for this.
Iliya Katushenock added 7 commits 2024-01-03 15:54:28 +01:00
91d818ce47 progress
delete polar_lerp, optimize face vert interpolation, add TriangleRange view
ab8a39c647 cleanup
comments
816add5a70 progress
cleanup/optimizations of face corner creation
Wannes Malfait reviewed 2024-01-03 16:34:25 +01:00
Wannes Malfait left a comment
Member

I know this is just WIP, but I was interested to see how you constructed the icosphere.

I think the comments shouldn't be too difficult to implement

I know this is just WIP, but I was interested to see how you constructed the icosphere. I think the comments shouldn't be too difficult to implement
@ -118,0 +256,4 @@
positions.last() = float3(0.0f, 0.0f, -1.0f);
for (float3 &position : positions) {
position *= radius;
Member

I would suggest only scaling by the radius after all the points (also the subdivided) have been calculated. This way you avoid having to do expensive normalisations later.

You could then also do it in a parallel_for loop.

I would suggest only scaling by the radius after all the points (also the subdivided) have been calculated. This way you avoid having to do expensive normalisations later. You could then also do it in a parallel_for loop.
mod_moder marked this conversation as resolved
@ -118,0 +369,4 @@
const float3 normalized_b = math::normalize(point_b);
const float3 normal = math::normalize(
math::cross_tri(float3(0.0f), normalized_a, normalized_b));
Member

If a and b are normalized, then the cross product is automatically normalized.

As I mentioned in the other comment, we don't need even need to normalize a and b if we don't scale by the radius at the start!

If `a` and `b` are normalized, then the cross product is automatically normalized. As I mentioned in the other comment, we don't need even need to normalize `a` and `b` if we don't scale by the radius at the start!
mod_moder marked this conversation as resolved
@ -118,0 +376,4 @@
for (float3 &vert : verts) {
steps += rotation * lerp_factor;
const math::AxisAngle axis(normal, steps);
vert = math::transform_point(math::to_quaternion(axis), point_a);
Member

I think it is possible to speed this up a little bit:

  • We want to spherically interpolate edge_verts_num points between a and b. (let's call them $v_1, v_2, \dots$)
  • If we know v_k, we can calculate v_{k+1} using a reflection: v_{k+1} = \text{Refl}_{v_k}(v_{k-1}) = 2 \;\text{dot}(v_k, v_{k-1}) v_{k} - v_{k+1}. This is faster than calculating a rotation. Also, since the points are evenly spaced, the dot product is always the same (so you only need to calculate it once).
  • To reflect, we still need to find the value of v_1 (since $v_0 = a$). This can be done as you did, or using the slerp formula: v_1 = a \frac{\sin((1- t) \alpha)}{\sin (\alpha )} + b \frac{\sin(t \alpha)}{\sin(\alpha)} where \alpha is the angle between a and b. I think this formula might be faster than calculating the normal and using the axis angle rotations.

See: https://en.wikipedia.org/wiki/Slerp for more information.

I think it is possible to speed this up a little bit: - We want to spherically interpolate `edge_verts_num` points between `a` and `b`. (let's call them $v_1, v_2, \dots$) - If we know $v_k$, we can calculate $v_{k+1}$ using a reflection: $v_{k+1} = \text{Refl}_{v_k}(v_{k-1}) = 2 \;\text{dot}(v_k, v_{k-1}) v_{k} - v_{k+1}$. This is faster than calculating a rotation. Also, since the points are evenly spaced, the dot product is always the same (so you only need to calculate it once). - To reflect, we still need to find the value of $v_1$ (since $v_0 = a$). This can be done as you did, or using the `slerp` formula: $v_1 = a \frac{\sin((1- t) \alpha)}{\sin (\alpha )} + b \frac{\sin(t \alpha)}{\sin(\alpha)}$ where $\alpha$ is the angle between $a$ and $b$. I think this formula might be faster than calculating the normal and using the axis angle rotations. See: https://en.wikipedia.org/wiki/Slerp for more information.
Author
Member

Didn't think about reflection as way to rotate points! Right now this function take half of time of whole node (except normals calculation). So i am very intereasting to check this approath. Initially i thought that scaling might be much more expensive (and i was believe there is way to faster rotate along normal), but i can see that this would be much better to add separate transformation step later.
Thanks for check this!

Didn't think about reflection as way to rotate points! Right now this function take half of time of whole node (except normals calculation). So i am very intereasting to check this approath. Initially i thought that scaling might be much more expensive (and i was believe there is way to faster rotate along normal), but i can see that this would be much better to add separate transformation step later. Thanks for check this!
Member

Here is my idea for calculating normals:

  • For the vertex normals: these are just the same as the position (if you have not yet scaled by the radius). (The normal of a point on the unit sphere is just the point itself).
  • For the face normals you can average the positions of the 3 vertices and then normalize.
Here is my idea for calculating normals: - For the vertex normals: these are just the same as the position (if you have not yet scaled by the radius). (The normal of a point on the unit sphere is just the point itself). - For the face normals you can average the positions of the 3 vertices and then normalize.
Author
Member

I mean, order of points/edges in face corner. Right now this order is random, and i have to calculate real normal (based on this order) for each face, and flip this face in case normal does not directed into required side. Probably possible to change algorithms instead (to keep correct corner order for each faces), but this is not important right now. So i just treet timings of normal fixing as separate part of node measurement.

I mean, order of points/edges in face corner. Right now this order is random, and i have to calculate real normal (based on this order) for each face, and flip this face in case normal does not directed into required side. Probably possible to change algorithms instead (to keep correct corner order for each faces), but this is not important right now. So i just treet timings of normal fixing as separate part of node measurement.
Author
Member

Originl: Timer 'interpolate_face_points': (Average: 57.03 ms, Min: 55.49 ms, Last: 55.49 ms)
SphericalIterator only for face points: Timer 'interpolate_face_points': (Average: 18.46 ms, Min: 16.43 ms, Last: 17.15 ms)
SphericalIterator for virtual points on edges, and actually face points: Timer 'interpolate_face_points': (Average: 17.44 ms, Min: 15.55 ms, Last: 17.53 ms)

Timer 'Scaling': (Average: 5.04 ms, Min: 4.72 ms, Last: 5.03 ms)

Edit: for 10 level of subdivision.

Originl: Timer 'interpolate_face_points': (Average: 57.03 ms, Min: 55.49 ms, Last: 55.49 ms) `SphericalIterator` only for face points: Timer 'interpolate_face_points': (Average: 18.46 ms, Min: 16.43 ms, Last: 17.15 ms) `SphericalIterator` for virtual points on edges, and actually face points: Timer 'interpolate_face_points': (Average: 17.44 ms, Min: 15.55 ms, Last: 17.53 ms) Timer 'Scaling': (Average: 5.04 ms, Min: 4.72 ms, Last: 5.03 ms) Edit: for 10 level of subdivision.
Author
Member

In the end i found the fact that spherical interpolation is was not the way how icosphere was made. For now i use just normalize(lerp()) to keep old result. I'll also have to check if this will be fast enough in context of low precision of such method. So some modifications also required. Thanks again for this investigation, i'll back to this later.

In the end i found the fact that spherical interpolation is was not the way how icosphere was made. For now i use just `normalize(lerp())` to keep old result. I'll also have to check if this will be fast enough in context of low precision of such method. So some modifications also required. Thanks again for this investigation, i'll back to this later.
mod_moder marked this conversation as resolved
Iliya Katushenock added 2 commits 2024-01-03 19:28:50 +01:00
Iliya Katushenock added 1 commit 2024-01-03 20:42:16 +01:00
c5b9de4e28 progress
change SphericalIterator to dooble types, make edge_line_points more sequential
Iliya Katushenock added 2 commits 2024-01-03 23:47:48 +01:00
Iliya Katushenock added 3 commits 2024-01-04 16:24:33 +01:00
Iliya Katushenock added 8 commits 2024-01-04 21:37:41 +01:00
Iliya Katushenock added 4 commits 2024-01-05 13:45:48 +01:00
Iliya Katushenock added 1 commit 2024-01-05 13:53:50 +01:00
Iliya Katushenock added 1 commit 2024-01-05 22:54:12 +01:00
Iliya Katushenock added 2 commits 2024-01-06 11:13:30 +01:00
Iliya Katushenock added 2 commits 2024-01-10 12:32:06 +01:00
Iliya Katushenock added a new dependency 2024-01-10 13:30:41 +01:00
Iliya Katushenock added 10 commits 2024-02-07 22:36:36 +01:00
12fb555ad2 progress
simplify positions, use regular interpolation
141ab22ece progress
refactor uv to use mix2
970a18a0b8 progress
more use of trinalte-ranges
806c8b207b progress
more reuse of data
Iliya Katushenock added 4 commits 2024-02-08 09:46:01 +01:00
Iliya Katushenock added 5 commits 2024-02-08 12:29:51 +01:00
Iliya Katushenock added 2 commits 2024-02-08 13:22:21 +01:00
Iliya Katushenock added 3 commits 2024-02-08 14:07:02 +01:00
Iliya Katushenock reviewed 2024-02-08 14:34:24 +01:00
@ -11,0 +25,4 @@
#include "BLI_ordered_edge.hh"
#include "BLI_span.hh"
#include "BLI_timeit.hh"
Author
Member

This, some commented SCOPED_TIMER_AVERAGED, some other commented things and also BMesh code will be keep here while viewe to simplify checking. But can be deleted on ping.

This, some commented `SCOPED_TIMER_AVERAGED`, some other commented things and also BMesh code will be keep here while viewe to simplify checking. But can be deleted on ping.
Iliya Katushenock removed a dependency 2024-02-08 14:38:35 +01:00
Iliya Katushenock changed title from WIP: Refactoring: Geometry Nodes: Rewrite Ico Sphere mesh primitive to Refactoring: Geometry Nodes: Rewrite Ico Sphere mesh primitive 2024-02-08 14:38:41 +01:00
Iliya Katushenock added this to the Nodes & Physics project 2024-02-08 14:38:48 +01:00
First-time contributor

Is this 100% compatible with the previous implementation or e.g. index ordering changed, UV positioning changed?

Is this 100% compatible with the previous implementation or e.g. index ordering changed, UV positioning changed?
Author
Member

Only indices is changed.

Only indices is changed.
Iliya Katushenock added 4 commits 2024-02-10 14:16:24 +01:00
Iliya Katushenock added 5 commits 2024-02-11 23:38:35 +01:00
Iliya Katushenock added 1 commit 2024-02-11 23:50:55 +01:00
4730b57668 cleanup
other points->verts
Iliya Katushenock added 6 commits 2024-04-03 20:20:54 +02:00
fd8ce921ec cleanup
license
6d598733d6 cleanup
use math::numbers
f3cb0cf08f cleanup
Triangular number link
3fdf8b5753 cleanup
redundant debug fill
7e2b545f9b cleanup
use math::numbers
Merge conflict checking is in progress. Try again in few moments.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u ico_sphere:mod_moder-ico_sphere
git checkout mod_moder-ico_sphere
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#116729
No description provided.