Geometry Nodes: Rewrite Scale Elements node #115142

Merged
Hans Goudey merged 36 commits from mod_moder/blender:tmp_speedup_scale_elements into main 2024-04-15 21:55:50 +02:00

Rewrite of Scale Elements. Main changes related with removing unnecessary
abstractions (like structures of fields). Next, by using grouping approach,
all data is represented as spans. This provide ability to unify code for
different domains. Using of general utils like IndexMask, Group processing
and array utils provides much more parallelism and better memory usage.
In result, this refactoring result in 4-10 average speed improvement in
attached benchmark file with different probability and scale of elements.

Measurements of speed improvement for Face domain (ms):
Before:

Probability \ Scale 0.1 1 2 3 4
0.1 870.63 831.13 826.69 833.84 838.3
0.25 832.87 845.32 861.84 834.01 841.95
0.5 451.98 423.2 268.47 453.06 379.79
0.75 20.92 19.46 20.62 20.76 21.11
0.9 21.21 19.8 20.07 21.82 20.85

After:

Probability \ Scale 0.1 1 2 3 4
0.1 214.63 176.11 176.54 176.95 183.03
0.25 186.04 170.19 187.91 182.91 182.32
0.5 126.39 119.48 108.46 114.83 122.53
0.75 6.19 7.69 6.02 5.91 17.55
0.9 5.46 5.77 5.62 6.02 6.1

Average speed up:

Probability \ Scale 0.1 1 2 3 4
0.1 4.0 4.7 4.6 4.7 4.5
0.25 4.4 4.9 4.5 4.5 4.6
0.5 3.5 3.5 2.4 3.9 3.0
0.75 3.3 2.5 3.4 3.5 1.2
0.9 3.8 3.4 3.5 3.6 3.4

Measurements of speed improvement for Edge domain (ms):
Before:

Probability \ Scale 0.1 1 2 3 4
0.1 4'294.62 5'100 5'200 5'200 5'300
0.25 5'100 5'100 5'200 5'100 4'702.83
0.5 2'426.06 2'242.14 1'533.39 2'247.04 1'687.36
0.75 28.24 30.58 27.49 31.08 28.96
0.9 27.79 24.91 27.31 29.35 26.47

After:

Probability \ Scale 0.1 1 2 3 4
0.1 507.54 491.09 493.88 503.8 507.35
0.25 489.82 488.39 518.28 729.58 569.13
0.5 294.78 271.01 305.45 292.44 298.79
0.75 11.35 14.91 12.23 12.52 16.55
0.9 12.36 12.57 12.04 11.58 11.45

Average speed up:

Probability \ Scale 0.1 1 2 3 4
0.1 8.4 10.3 10.5 10.3 10.4
0.25 10.4 10.4 10.0 6.9 8.2
0.5 8.2 8.2 5.0 7.6 5.6
0.75 2.4 2.0 2.2 2.4 1.7
0.9 2.2 1.9 2.2 2.5 2.3
Rewrite of Scale Elements. Main changes related with removing unnecessary abstractions (like structures of fields). Next, by using grouping approach, all data is represented as spans. This provide ability to unify code for different domains. Using of general utils like IndexMask, Group processing and array utils provides much more parallelism and better memory usage. In result, this refactoring result in 4-10 average speed improvement in attached benchmark file with different probability and scale of elements. Measurements of speed improvement for Face domain (ms): Before: | Probability \ Scale | 0.1 | 1 | 2 | 3 | 4 | | -- | -- | -- | -- | -- | -- | | 0.1 | 870.63 | 831.13 | 826.69 | 833.84 | 838.3 | | 0.25 | 832.87 | 845.32 | 861.84 | 834.01 | 841.95 | | 0.5 | 451.98 | 423.2 | 268.47 | 453.06 | 379.79 | | 0.75 | 20.92 | 19.46 | 20.62 | 20.76 | 21.11 | | 0.9 | 21.21 | 19.8 | 20.07 | 21.82 | 20.85 | After: | Probability \ Scale | 0.1 | 1 | 2 | 3 | 4 | | -- | -- | -- | -- | -- | -- | | 0.1 | 214.63 | 176.11 | 176.54 | 176.95 | 183.03 | | 0.25 | 186.04 | 170.19 | 187.91 | 182.91 | 182.32 | | 0.5 | 126.39 | 119.48 | 108.46 | 114.83 | 122.53 | | 0.75 | 6.19 | 7.69 | 6.02 | 5.91 | 17.55 | | 0.9 | 5.46 | 5.77 | 5.62 | 6.02 | 6.1 | Average speed up: | Probability \ Scale | 0.1 | 1 | 2 | 3 | 4 | | -- | -- | -- | -- | -- | -- | | 0.1 | 4.0 | 4.7 | 4.6 | 4.7 | 4.5 | | 0.25 | 4.4 | 4.9 | 4.5 | 4.5 | 4.6 | | 0.5 | 3.5 | 3.5 | 2.4 | 3.9 | 3.0 | | 0.75 | 3.3 | 2.5 | 3.4 | 3.5 | 1.2 | | 0.9 | 3.8 | 3.4 | 3.5 | 3.6 | 3.4 | Measurements of speed improvement for Edge domain (ms): Before: | Probability \ Scale | 0.1 | 1 | 2 | 3 | 4 | | -- | -- | -- | -- | -- | -- | | 0.1 | 4'294.62 | 5'100 | 5'200 | 5'200 | 5'300 | | 0.25 | 5'100 | 5'100 | 5'200 | 5'100 | 4'702.83 | | 0.5 | 2'426.06 | 2'242.14 | 1'533.39 | 2'247.04 | 1'687.36 | | 0.75 | 28.24 | 30.58 | 27.49 | 31.08 | 28.96 | | 0.9 | 27.79 | 24.91 | 27.31 | 29.35 | 26.47 | After: | Probability \ Scale | 0.1 | 1 | 2 | 3 | 4 | | -- | -- | -- | -- | -- | -- | | 0.1 | 507.54 | 491.09 | 493.88 | 503.8 | 507.35 | | 0.25 | 489.82 | 488.39 | 518.28 | 729.58 | 569.13 | | 0.5 | 294.78 | 271.01 | 305.45 | 292.44 | 298.79 | | 0.75 | 11.35 | 14.91 | 12.23 | 12.52 | 16.55 | | 0.9 | 12.36 | 12.57 | 12.04 | 11.58 | 11.45 | Average speed up: | Probability \ Scale | 0.1 | 1 | 2 | 3 | 4 | | -- | -- | -- | -- | -- | -- | | 0.1 | 8.4 | 10.3 | 10.5 | 10.3 | 10.4 | | 0.25 | 10.4 | 10.4 | 10.0 | 6.9 | 8.2 | | 0.5 | 8.2 | 8.2 | 5.0 | 7.6 | 5.6 | | 0.75 | 2.4 | 2.0 | 2.2 | 2.4 | 1.7 | | 0.9 | 2.2 | 1.9 | 2.2 | 2.5 | 2.3 |
Iliya Katushenock added 1 commit 2023-11-19 16:23:33 +01:00
Iliya Katushenock added 1 commit 2023-11-19 20:41:14 +01:00
Iliya Katushenock added 3 commits 2023-11-22 21:16:12 +01:00
Iliya Katushenock changed title from WIP: Geometry Nodes: Scale Elements node speedup to Geometry Nodes: Scale Elements node speedup 2023-11-24 13:33:56 +01:00
Iliya Katushenock added this to the Nodes & Physics project 2023-11-24 13:34:32 +01:00
Iliya Katushenock added the
Interest
Geometry Nodes
label 2023-11-24 13:34:37 +01:00
Iliya Katushenock added 4 commits 2023-11-24 13:34:50 +01:00
Iliya Katushenock added 1 commit 2023-11-24 13:36:07 +01:00
Iliya Katushenock added 1 commit 2023-11-24 13:36:42 +01:00
buildbot/vexp-code-patch-coordinator Build done. Details
6173513f36
cleanup
Iliya Katushenock changed title from Geometry Nodes: Scale Elements node speedup to Geometry Nodes: Rewrite Scale Elements node 2023-11-24 13:38:45 +01:00
Member

@blender-bot build

@blender-bot build
Iliya Katushenock added 2 commits 2023-11-24 18:48:15 +01:00
Iliya Katushenock added 1 commit 2023-11-24 20:00:57 +01:00
Iliya Katushenock added 3 commits 2023-11-25 14:02:31 +01:00
Iliya Katushenock requested review from Hans Goudey 2023-11-25 14:04:20 +01:00
Iliya Katushenock added 2 commits 2023-12-01 10:28:18 +01:00
Hans Goudey requested changes 2023-12-03 17:07:06 +01:00
Hans Goudey left a comment
Member

Looks good! More straightforward than I expected actually.

Looks good! More straightforward than I expected actually.
@ -104,0 +132,4 @@
});
}
static Array<int> reverse_indices_in_groups(const Span<int> group_indices,
Member

I think it would make sense to move these to a separate implementation and header file as part of this commit, de-duplicating the implementation in mesh_mapping.cc at the same time. That would make me feel more confident about the final state of the code anyway.

I think it would make sense to move these to a separate implementation and header file as part of this commit, de-duplicating the implementation in `mesh_mapping.cc` at the same time. That would make me feel more confident about the final state of the code anyway.
Author
Member

Such separate file is implemented in #111081, for now, i just copy-paste part of code to just avoid locking by general bli solution.

Such separate file is implemented in https://projects.blender.org/blender/blender/pulls/111081, for now, i just copy-paste part of code to just avoid locking by general bli solution.
mod_moder marked this conversation as resolved
@ -177,0 +302,4 @@
const float total = elem_island.size();
BLI_assert(total > 0.0f);
const float scale = accumulate<float>(scale_varray, elem_island) / total;
const float3 center = accumulate<float3>(center_varray, elem_island) / total;
Member

This method of finding the mean is prone to overflow. Maybe use a mean function that adds each element multiplied by the 1 / size factor.

This method of finding the mean is prone to overflow. Maybe use a `mean` function that adds each element multiplied by the `1 / size` factor.
Author
Member

Right now i just reimplement original algorithm, but this function can be changed to be more safe, and be moved to something like array_utils / attribute_math. and also to be reused in attribute statistic node too.

Right now i just reimplement original algorithm, but this function can be changed to be more safe, and be moved to something like `array_utils` / `attribute_math`. and also to be reused in attribute statistic node too.
mod_moder marked this conversation as resolved
@ -290,0 +371,4 @@
/* If result indices is for gathered array, map than back into global indices. */
if (face_mask.size() != mesh.faces_num) {
parallel_transform<int>(r_item_indices, 4098, [&](const int pos) { return face_mask[pos]; });
Member

[] on IndexMask here seems slow possibly? I'd think there would be another way to structure this

`[]` on `IndexMask` here seems slow possibly? I'd think there would be another way to structure this
Author
Member

It's good to use O(1) way if that is possible. But in such case, there is just 2 way: binary search ( O(N * log2(size / ~16000)) ) and array of indices ( O(N + size) ). Second one is prefer, but that is not memory free...
Generally, index mas segmentation should be as few as possible (all random indices should be grouped in large segments), so real complexity only depend on selection~. I think it is okay to keep such small complexity if that is necessery in such place and thinking about something like VArray::FromIndexMask and VArray::FromIndexMaskRevers to support corner cases (all indices is one span, all indices is just one range, ...) as just kind of devertualize closures.

It's good to use O(1) way if that is possible. But in such case, there is just 2 way: binary search ( O(N * log2(size / ~16000)) ) and array of indices ( O(N + size) ). Second one is prefer, but that is not memory free... Generally, index mas segmentation should be as few as possible (all random indices should be grouped in large segments), so real complexity only depend on selection~. I think it is okay to keep such _small_ complexity if that is necessery in such place and thinking about something like `VArray::FromIndexMask` and `VArray::FromIndexMaskRevers` to support corner cases (all indices is one span, all indices is just one range, ...) as just kind of `devertualize` closures.
mod_moder marked this conversation as resolved
@ -344,3 +393,1 @@
edge_selection.foreach_index([&](const int edge_index) {
const int2 &edge = edges[edge_index];
disjoint_set.join(edge[0], edge[1]);
edge_mask.foreach_index_optimized<int>(GrainSize(4098), [&](const int edge_i) {
Member

4096 is the power of two you're looking for :D

`4096` is the power of two you're looking for :D
Author
Member

So many such typos....

So many such typos....
mod_moder marked this conversation as resolved
@ -449,0 +487,4 @@
const GroupedSpan<int> item_islands(item_offsets.as_span(), item_indices);
const GroupedSpan<int> vert_islands(vert_offsets.as_span(), vert_indices);
const VArray<float> &scale_varray = evaluator.get_evaluated<float>(0);
Member

Remove &, it's fine but a bit misleading, since this is binding the reference to a temporary

Remove `&`, it's fine but a bit misleading, since this is binding the reference to a temporary
mod_moder marked this conversation as resolved
Iliya Katushenock added 2 commits 2023-12-04 17:17:58 +01:00
Iliya Katushenock added 2 commits 2023-12-07 19:31:27 +01:00
Iliya Katushenock requested review from Hans Goudey 2023-12-16 23:15:02 +01:00
Iliya Katushenock added 1 commit 2023-12-16 23:15:11 +01:00
Iliya Katushenock added 1 commit 2023-12-29 11:22:12 +01:00
Hans Goudey requested changes 2023-12-29 17:42:53 +01:00
Hans Goudey left a comment
Member

Overall structure makes sense! Just a few smaller comments

Overall structure makes sense! Just a few smaller comments
@ -101,0 +211,4 @@
for (const int i : indices.slice(range)) {
value += values[i];
}
return join_accumulators({value / float(indices.size()), 1}, other);
Member

Shouldn't this be range.size()?

Shouldn't this be `range.size()`?
mod_moder marked this conversation as resolved
@ -207,2 +333,2 @@
for (const int island_index : range) {
const ElementIsland &island = islands[island_index];
AtomicDisjointSet disjoint_set(vert_mask.size());
const GroupedSpan<int> face_verts(mesh.face_offsets(), mesh.corner_verts());
Member

mesh.face_offsets() -> mesh.faces()

`mesh.face_offsets()` -> `mesh.faces()`
mod_moder marked this conversation as resolved
@ -394,1 +438,3 @@
scale_vertex_islands_on_axis(mesh, island, params, get_edge_verts);
/* If result indices is for gathered array, map than back into global indices. */
if (edge_mask.size() != mesh.edges_num) {
parallel_transform<int>(r_item_indices, 4096, [&](const int pos) { return edge_mask[pos]; });
Member

Like I mentioned on the other PR, I'm skeptical of this random access to IndexMask. Feels like structuring the code not to require that would be better

Like I mentioned on the other PR, I'm skeptical of this random access to `IndexMask`. Feels like structuring the code not to require that would be better
Author
Member

Temporal array just to fill-and-read once will change ~5% time of this function, but require allocation of array with size of some domains.
Benchmark was in: #114194 (comment)

Temporal array just to fill-and-read once will change ~5% time of this function, but require allocation of array with size of some domains. Benchmark was in: https://projects.blender.org/blender/blender/pulls/114194#issuecomment-1092985
Member

Yeah, I still think it's safer to choose the option with the better computational complexity. It's more likely that its performance won't change for the worse in the future.

Yeah, I still think it's safer to choose the option with the better computational complexity. It's more likely that its performance won't change for the worse in the future.
mod_moder marked this conversation as resolved
Iliya Katushenock added 3 commits 2023-12-31 13:04:37 +01:00
Iliya Katushenock requested review from Hans Goudey 2024-01-10 20:09:40 +01:00
Member

Guess the remaining thing to do is switch to using to_indices for random index mask access?

Guess the remaining thing to do is switch to using `to_indices` for random index mask access?
Iliya Katushenock added 2 commits 2024-01-19 20:34:05 +01:00
Iliya Katushenock added 2 commits 2024-02-05 19:55:54 +01:00
Iliya Katushenock added 1 commit 2024-02-27 21:50:35 +01:00
Hans Goudey reviewed 2024-02-27 22:40:13 +01:00
@ -112,0 +236,4 @@
Mesh &mesh)
{
MutableSpan<float3> positions = mesh.vert_positions_for_write();
threading::parallel_for(
Member

Is this supposed to be parallel_for_weighted?

Is this supposed to be `parallel_for_weighted`?
Author
Member

Oh...

Oh...
mod_moder marked this conversation as resolved
Iliya Katushenock added 2 commits 2024-02-28 12:10:11 +01:00
Iliya Katushenock added 1 commit 2024-04-07 21:43:34 +02:00
Jacques Lucke reviewed 2024-04-08 13:35:58 +02:00
Jacques Lucke left a comment
Member

I mainly how much of this code is more general purpose. We have many places where we split indices into many groups. This patch seems to implement a new approach which is fine but maybe this approach should also be used elsewhere? Have you identified trade-offs between the different approaches we use already?

I mainly how much of this code is more general purpose. We have many places where we split indices into many groups. This patch seems to implement a new approach which is fine but maybe this approach should also be used elsewhere? Have you identified trade-offs between the different approaches we use already?
Author
Member

I mainly how much of this code is more general purpose.

I have general solution for this, see #111081. I just copy-paste part of code to do not be locked by more general util.

This patch seems to implement a new approach which is fine but maybe this approach should also be used elsewhere?

Part of #111081 is to use this in each place where this can be used.

Have you identified trade-offs between the different approaches we use already?

Well, PR will be such slow as current main if i'll use code from 782d096905/source/blender/blenkernel/intern/mesh_mapping.cc (L366) (this is the reason why i copy more general implementation which is support groups with different sizes).

> I mainly how much of this code is more general purpose. I have general solution for this, see #111081. I just copy-paste part of code to do not be locked by more general util. > This patch seems to implement a new approach which is fine but maybe this approach should also be used elsewhere? Part of #111081 is to use this in each place where this can be used. > Have you identified trade-offs between the different approaches we use already? Well, PR will be such slow as current main if i'll use code from https://projects.blender.org/blender/blender/src/commit/782d096905141ede7c0e405839cff4ead2fd1394/source/blender/blenkernel/intern/mesh_mapping.cc#L366 (this is the reason why i copy more general implementation which is support groups with different sizes).
Member

I can reproduce the speedup. Thanks for providing the benchmark file:

Before: 86 seconds
After: 15 seconds

I'm going to commit this. Next I think it would make sense to move all the index grouping utilities to a new BLI file. How about BLI_index_grouping.hh? We can talk about that more in chat too.

I can reproduce the speedup. Thanks for providing the benchmark file: ``` Before: 86 seconds After: 15 seconds ``` I'm going to commit this. Next I think it would make sense to move all the index grouping utilities to a new `BLI` file. How about `BLI_index_grouping.hh`? We can talk about that more in chat too.
Hans Goudey merged commit 2cb3677557 into main 2024-04-15 21:55:50 +02:00
Iliya Katushenock deleted branch tmp_speedup_scale_elements 2024-04-15 21:56:33 +02:00
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#115142
No description provided.