Refactoring: Geometry Nodes: Rewrite Ico Sphere mesh primitive #116729

Iliya Katushenock · 2024-01-03T00:43:38+01:00

Iliya Katushenock commented

2024-01-03 00:43:38 +01:00

Abstract

This refactoring is part of implicit project of removing BMesh from Geometry Nodes.
Other such commits: e83f46ea76, b44406f963, #112264.
Main goal is to avoid overhead for all listed case:

Mesh <-> BMesh conversion.
Unnecessary BMesh topology mapping.
A lot of small memory allocation as part of BMesh implementation.
Subdivision operations as part of BMesh design.
Single thread BMesh operations.

This Pull Request change Ico Sphere mesh primitive node to make result mesh from scratch in simple way.
This refactoring totaly delete old BMesh version from this file. New version is achieved impressive speed and
memory usage impact by generation mesh as single-time allocation final buffers of topology data and applying index math.
Everything works with linear complexity which is depends on amount of final mesh components (vertices, edges, faces, corners).
Now generation of Ico Sphere can be done much faster for final user in largers resolution, 100x~.

Benchmark

Bellow listed timings to compare old and new implementations (ms):

Kind \ Resolution	1	2	3	4	5	6	7	8	9	10	11
BMesh	0.18	0.29	0.68	1.78	5.61	20.89	87.94	384.46	2'227.26	14'300	-
Bmesh + UV	0.18	0.30	0.75	1.80	5.73	20.93	87.93	386.86	2'222.72	14'300	-
Mesh (single thread)	0.0348	0.0236	0.0280	0.15	0.30	0.76	2.08	6.01	21.87	91.53	465.59
Mesh + UV (single thread)	0.0384	0.0275	0.0372	0.19	0.38	0.95	2.77	9.18	41.71	188.31	766.56
Mesh (mutithreading)	0.0343	0.0249	0.339	0.34	0.23	0.84	2.05	4.45	14.80	65.35	301.89
Mesh + UV (mutithreading)	0.0398	0.0315	0.0350	0.19	0.36	0.82	2.41	5.89	27.86	123.57	503.94

Kind \ Resolution	1	2	3	4	5	6	7	8	9	10
BMesh / Mesh (single thread)	5.172x	12.28x	24.28x	11.86x	18.7x	27.48x	42.27x	63.97x	101.84x	156.23x
Bmesh + UV / Mesh + UV (single thread)	4.68x	10.90x	20.16x	9.47x	15.07x	22.03x	31.74x	42.14x	53.28x	75.93x
BMesh / Mesh (mutithreading)	5.24x	11.64x	2.00x	5.23x	24.39x	24.86x	42.89x	86.39x	150.49x	218.82x
Bmesh + UV / Mesh + UV (mutithreading)	4.52x	9.52x	21.42x	9.47x	15.91x	25.52x	36.48x	65.68x	79.78x	115.7x

UV is optional function and its impact is measured separately.
Previously Ico Sphere resolution was limited by 10 internally. New mesh version is limited by 12 (due to not enough memory to hold result). BMesh results for 11 resolution is not checked due to not enough memory.
Not so many threads is used there, main limitation there is memory bandwidth.

New algorithm produce result Mesh in single pass, and resolution is not limited by power of 2. Right now resolution is still
treated as power of 2, but now this not real limitation and this can be changed in future.

New algorithm produce mesh looks the same as older. The same positions of vertices, topology and uv map. Internal indices
is different though, so this might cause changed result for node tree setups which is depend on indices of mesh primitive.

Writing of all data is sequential (not randomly distributed segments).
Multithreading might introduce some randomization, but that is not matter on such level.

Possible improvements for future:

Reimplement interpolation functions to be more template, to avoid using of mix2 and improve CPU cach (here is one func just to interpolate 6 floats in one array by chunks...).
Explore ability to use float/double iterators instead of interpolation in context of this node and maybe in others.
Try to reuse / share topology on some small primitives.
3.1. Implicit sharing for all data of result ico sphere with one weak user?
Code might looks like just pretty simple subdivision of triangulated mesh. Can this part be extracted as separate node?
A lot of IndexRange's is used there as way to hide math operations over offsets and protect everything by assertions for start and size. But this also introduce a lot of int64_t math in geometry code which is deal with int's usually...
Check memory_bandwidth_bound_task for each face.

## Abstract This refactoring is part of implicit project of removing BMesh from Geometry Nodes. Other such commits: e83f46ea7630163ae836f04cff867eee99032efa, b44406f9634a35ebe0b4fd334715d63e75af08a0, #112264. Main goal is to avoid overhead for all listed case: 1. Mesh <-> BMesh conversion. 2. Unnecessary BMesh topology mapping. 3. A lot of small memory allocation as part of BMesh implementation. 4. Subdivision operations as part of BMesh design. 5. Single thread BMesh operations. This Pull Request change `Ico Sphere` mesh primitive node to make result mesh from scratch in simple way. This refactoring totaly delete old BMesh version from this file. New version is achieved impressive speed and memory usage impact by generation mesh as single-time allocation final buffers of topology data and applying index math. Everything works with linear complexity which is depends on amount of final mesh components (vertices, edges, faces, corners). Now generation of Ico Sphere can be done much faster for final user in largers resolution, 100x~. ## Benchmark Bellow listed timings to compare old and new implementations (ms): | Kind \ Resolution | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | | BMesh | 0.18 | 0.29 | 0.68 | 1.78 | 5.61 | 20.89 | 87.94 | 384.46 | 2'227.26 | 14'300 | - | | Bmesh + UV | 0.18 | 0.30 | 0.75 | 1.80 | 5.73 | 20.93 | 87.93 | 386.86 | 2'222.72 | 14'300 | - | | Mesh (single thread) | 0.0348 | 0.0236 | 0.0280 | 0.15 | 0.30 | 0.76 | 2.08 | 6.01 | 21.87 | 91.53 | 465.59 | | Mesh + UV (single thread) | 0.0384 | 0.0275 | 0.0372 | 0.19 | 0.38 | 0.95 | 2.77 | 9.18 | 41.71 | 188.31 | 766.56 | | Mesh (mutithreading) | 0.0343 | 0.0249 | 0.339 | 0.34 | 0.23 | 0.84 | 2.05 | 4.45 | 14.80 | 65.35 | 301.89 | | Mesh + UV (mutithreading) | 0.0398 | 0.0315 | 0.0350 | 0.19 | 0.36 | 0.82 | 2.41 | 5.89 | 27.86 | 123.57 | 503.94 | | Kind \ Resolution | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | | BMesh / Mesh (single thread) | 5.172x | 12.28x | 24.28x | 11.86x | 18.7x | 27.48x | 42.27x | 63.97x | 101.84x | 156.23x | | Bmesh + UV / Mesh + UV (single thread) | 4.68x | 10.90x | 20.16x | 9.47x | 15.07x | 22.03x | 31.74x | 42.14x | 53.28x | 75.93x | | BMesh / Mesh (mutithreading) | 5.24x | 11.64x | 2.00x | 5.23x | 24.39x | 24.86x | 42.89x | 86.39x | 150.49x | 218.82x | | Bmesh + UV / Mesh + UV (mutithreading) | 4.52x | 9.52x | 21.42x | 9.47x | 15.91x | 25.52x | 36.48x | 65.68x | 79.78x | 115.7x | _UV is optional function and its impact is measured separately._ Previously Ico Sphere resolution was limited by 10 internally. New mesh version is limited by 12 (due to not enough memory to hold result). BMesh results for 11 resolution is not checked due to not enough memory. Not so many threads is used there, main limitation there is memory bandwidth. New algorithm produce result Mesh in single pass, and resolution is not limited by power of 2. Right now resolution is still treated as power of 2, but now this not real limitation and this can be changed in future. New algorithm produce mesh looks the same as older. The same positions of vertices, topology and uv map. Internal indices is different though, so this might cause changed result for node tree setups which is depend on indices of mesh primitive. Writing of all data is sequential (not randomly distributed segments). Multithreading might introduce some randomization, but that is not matter on such level. ## Possible improvements for future: 1. Reimplement interpolation functions to be more template, to avoid using of `mix2` and improve CPU cach (here is one func just to interpolate 6 floats in one array by chunks...). 2. Explore ability to use float/double iterators instead of interpolation in context of this node and maybe in others. 3. Try to reuse / share topology on some small primitives. 3.1. Implicit sharing for all data of result ico sphere with one weak user? 4. Code might looks like just pretty simple subdivision of triangulated mesh. Can this part be extracted as separate node? 5. A lot of IndexRange's is used there as way to hide math operations over offsets and protect everything by assertions for start and size. But this also introduce a lot of int64_t math in geometry code which is deal with int's usually... 6. Check `memory_bandwidth_bound_task` for each face.

🚀 5

Iliya Katushenock added the

 @ -118,0 +256,4 @@
   positions.last() = float3(0.0f, 0.0f, -1.0f);
   for (float3 &position : positions) {
     position *= radius;

 @ -118,0 +369,4 @@
     const float3 normalized_b = math::normalize(point_b);
     const float3 normal = math::normalize(
         math::cross_tri(float3(0.0f), normalized_a, normalized_b));

 @ -118,0 +376,4 @@
     for (float3 &vert : verts) {
       steps += rotation * lerp_factor;
       const math::AxisAngle axis(normal, steps);
       vert = math::transform_point(math::to_quaternion(axis), point_a);

 @ -11,0 +25,4 @@
 #include "BLI_ordered_edge.hh"
 #include "BLI_span.hh"
 #include "BLI_timeit.hh"

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Refactoring: Geometry Nodes: Rewrite Ico Sphere mesh primitive #116729

Abstract

Benchmark

Possible improvements for future:

Checkout