Geometry Nodes: Face Group Boundaries node speedup #115138

Merged
Hans Goudey merged 12 commits from mod_moder/blender:speedup_face_group_boundaries into main 2023-12-05 14:45:15 +01:00

Parallel version of function to compute boundary edges.
New multithreaded algorithm is used concept from #112415, without creating jagged array of face indices though.
Array of indices is store last face for each edge. In case new face have different Group ID, face index
will be come to invalid state and will be bypassed in all next checks.

Speed improvement for attached example file is listed in table:

Cube resolution Main PR
20x20x20 71920 ns 97400 ns
100x100x100 1.27 ms 1.17 ms
500x500x500 79.37 ms 23.16 ms
1000x1000x1000 520.31 ms 142.21 ms
Parallel version of function to compute boundary edges. New multithreaded algorithm is used concept from #112415, without creating jagged array of face indices though. Array of indices is store last face for each edge. In case new face have different Group ID, face index will be come to invalid state and will be bypassed in all next checks. Speed improvement for attached example file is listed in table: | Cube resolution | Main | PR | | -- | -- | -- | | 20x20x20 | `71920 ns` | `97400 ns` | | 100x100x100 | `1.27 ms` | `1.17 ms` | | 500x500x500 | `79.37 ms` | `23.16 ms` | | 1000x1000x1000 | `520.31 ms` | `142.21 ms` |
Iliya Katushenock added 1 commit 2023-11-19 14:13:25 +01:00
Member

Nice numbers. It also might be worth considering using the edge to corner topology map?

Nice numbers. It also might be worth considering using the edge to corner topology map?
Author
Member

I also thinking about that. I haven't test this. But this have some similar ideas: Here is used the same atomic list of last indices as in #112415. But here coping of indices in to segmented span is not necessary due to we can check is all elements have the same group id during list building.

I also thinking about that. I haven't test this. But this have some similar ideas: Here is used the same atomic list of last indices as in https://projects.blender.org/blender/blender/pulls/112415. But here coping of indices in to segmented span is not necessary due to we can check is all elements have the same group id during list building.
Iliya Katushenock added 1 commit 2023-11-19 18:54:44 +01:00
Iliya Katushenock added this to the Nodes & Physics project 2023-11-19 18:58:50 +01:00
Iliya Katushenock added the
Interest
Geometry Nodes
label 2023-11-19 18:58:55 +01:00
Iliya Katushenock requested review from Hans Goudey 2023-11-19 18:59:02 +01:00
Iliya Katushenock added 2 commits 2023-11-20 13:48:26 +01:00
Iliya Katushenock added 1 commit 2023-11-20 16:01:18 +01:00
Hans Goudey reviewed 2023-11-21 13:58:14 +01:00
Hans Goudey left a comment
Member

I'll still probably test a version using the topology map, but this is looking reasonable now.

I'll still probably test a version using the topology map, but this is looking reasonable now.
@ -57,3 +58,1 @@
if (edge_face_set[edge] != face_set[i]) {
/* This edge is connected to two faces on different face sets. */
boundary[edge] = true;
Array<int> previos_face(mesh.totedge, no_face_yet);
Member

Using Array<std::atomic<int>> probably makes this code a bit nicer, worth looking into?

Using `Array<std::atomic<int>>` probably makes this code a bit nicer, worth looking into?
Author
Member

Will check this now. Mainly i try to avoid std::atomic<T> due to that is not real T instance, so any T -> atomic<T> conversions have to be done as copy... but in in pr case, not sure if there is any problem. I found that size of std::atomic<T> can be the same as T, but that is not guaranteed, but this is not really important for that case too i hope....

Will check this now. Mainly i try to avoid `std::atomic<T>` due to that is not real `T` instance, so any `T` -> `atomic<T>` conversions have to be done as copy... but in in pr case, not sure if there is any problem. I found that size of `std::atomic<T>` can be the same as `T`, but that is not guaranteed, but this is not really important for that case too i hope....
mod_moder marked this conversation as resolved
Iliya Katushenock added 2 commits 2023-11-21 16:11:52 +01:00
Hans Goudey reviewed 2023-11-21 17:50:27 +01:00
@ -60,0 +62,4 @@
const static constexpr int no_face_yet = -1;
const static constexpr int no_more_face = -2;
threading::parallel_for(edge_states.index_range(), 4096, [&](const IndexRange range) {
Member

Does this work?

Array<std::atomic_int> edge_states(mesh.totedge, no_face_yet);

I guess maybe not. Ideally this would compile down to a memset. If it doesn't, maybe worth using the old version. A bit tricky...

Does this work? ``` Array<std::atomic_int> edge_states(mesh.totedge, no_face_yet); ``` I guess maybe not. Ideally this would compile down to a `memset`. If it doesn't, maybe worth using the old version. A bit tricky...
Author
Member

There is no copy-constructor...
Also about memset (just found) https://devblogs.microsoft.com/oldnewthing/20180328-00/?p=98365
My mesure show the same speed, so not sure if this have any other problems unlike this manual initialization...

There is no copy-constructor... Also about `memset` (just found) https://devblogs.microsoft.com/oldnewthing/20180328-00/?p=98365 My mesure show the same speed, so not sure if this have any other problems unlike this manual initialization...
Hans Goudey requested review from Jacques Lucke 2023-11-21 18:07:40 +01:00
Iliya Katushenock added 2 commits 2023-11-21 18:39:30 +01:00
Hans Goudey requested changes 2023-11-27 04:34:59 +01:00
Hans Goudey left a comment
Member

Tested this with topology maps like I mentioned:

    Array<int> edge_to_face_offsets;
    Array<int> edge_to_face_indices;
    const GroupedSpan<int> edge_to_face = bke::mesh::build_edge_to_face_map(mesh.faces(),
                                                                            mesh.corner_edges(),
                                                                            mesh.totedge,
                                                                            edge_to_face_offsets,
                                                                            edge_to_face_indices);
    Array<bool> boundary(mask.min_array_size());
    mask.foreach_index(GrainSize(1024), [&](const int edge) {
      const Span<int> faces = edge_to_face[edge];
      if (UNLIKELY(faces.is_empty())) {
        boundary[edge] = false;
        return;
      }

      const int first = face_set_span[faces.first()];
      boundary[edge] = std::any_of(faces.begin() + 1, faces.end(), [&](const int face) {
        return face_set_span[face] != first;
      });
    });

20x20x20

  • Before: Average: 30472 ns, Min: 20530 ns, Last: 28770 ns
  • Topology map: Average: 0.11 ms, Min: 71328 ns, Last: 73218 ns
  • Topology map + VArraySpan: Average: 0.10 ms, Min: 58359 ns, Last: 99817 ns
  • Iliya's code (this PR): Average: 44942 ns, Min: 17620 ns, Last: 53579 ns

100x100x100

  • Before: Average: 0.61 ms, Min: 0.57 ms, Last: 0.61 ms
  • Topology map: Average: 0.91 ms, Min: 0.84 ms, Last: 0.85 ms
  • Topology map + VArraySpan: Average: 0.90 ms, Min: 0.83 ms, Last: 0.91 ms
  • Iliya's code (this PR): Average: 0.30 ms, Min: 0.25 ms, Last: 0.27 ms

500x500x500

  • Before: Average: 19.66 ms, Min: 19.26 ms, Last: 19.36 ms
  • Topology map: Average: 24.24 ms, Min: 23.05 ms, Last: 23.93 ms
  • Topology map + VArraySpan: Average: 24.01 ms, Min: 23.18 ms, Last: 24.18 ms
  • Iliya's code (this PR): Average: 6.28 ms, Min: 5.99 ms, Last: 6.18 ms

1000x1000x1000

  • Before: Average: 196.50 ms, Min: 193.72 ms, Last: 195.56 ms
  • Topology map: Average: 116.46 ms, Min: 114.49 ms, Last: 115.51 ms
  • Topology map + VArraySpan: Average: 114.31 ms, Min: 112.78 ms, Last: 114.75 ms
  • Iliya's code (this PR): Average: 30.97 ms, Min: 29.48 ms, Last: 30.83 ms

Unless the topology maps are already cached (edge to face maps currently aren't), the approach in this PR is much faster. So I think it's best to go with this for now. One reason it's better is probably because we end up reading from the face group ID varray much less. Nice job :)

I left some style comments though.

Tested this with topology maps like I mentioned: ```cpp Array<int> edge_to_face_offsets; Array<int> edge_to_face_indices; const GroupedSpan<int> edge_to_face = bke::mesh::build_edge_to_face_map(mesh.faces(), mesh.corner_edges(), mesh.totedge, edge_to_face_offsets, edge_to_face_indices); Array<bool> boundary(mask.min_array_size()); mask.foreach_index(GrainSize(1024), [&](const int edge) { const Span<int> faces = edge_to_face[edge]; if (UNLIKELY(faces.is_empty())) { boundary[edge] = false; return; } const int first = face_set_span[faces.first()]; boundary[edge] = std::any_of(faces.begin() + 1, faces.end(), [&](const int face) { return face_set_span[face] != first; }); }); ``` **20x20x20** - Before: Average: 30472 ns, Min: 20530 ns, Last: 28770 ns - Topology map: Average: 0.11 ms, Min: 71328 ns, Last: 73218 ns - Topology map + VArraySpan: Average: 0.10 ms, Min: 58359 ns, Last: 99817 ns - Iliya's code (this PR): Average: 44942 ns, Min: 17620 ns, Last: 53579 ns **100x100x100** - Before: Average: 0.61 ms, Min: 0.57 ms, Last: 0.61 ms - Topology map: Average: 0.91 ms, Min: 0.84 ms, Last: 0.85 ms - Topology map + VArraySpan: Average: 0.90 ms, Min: 0.83 ms, Last: 0.91 ms - Iliya's code (this PR): Average: 0.30 ms, Min: 0.25 ms, Last: 0.27 ms **500x500x500** - Before: Average: 19.66 ms, Min: 19.26 ms, Last: 19.36 ms - Topology map: Average: 24.24 ms, Min: 23.05 ms, Last: 23.93 ms - Topology map + VArraySpan: Average: 24.01 ms, Min: 23.18 ms, Last: 24.18 ms - Iliya's code (this PR): Average: 6.28 ms, Min: 5.99 ms, Last: 6.18 ms **1000x1000x1000** - Before: Average: 196.50 ms, Min: 193.72 ms, Last: 195.56 ms - Topology map: Average: 116.46 ms, Min: 114.49 ms, Last: 115.51 ms - Topology map + VArraySpan: Average: 114.31 ms, Min: 112.78 ms, Last: 114.75 ms - Iliya's code (this PR): Average: 30.97 ms, Min: 29.48 ms, Last: 30.83 ms Unless the topology maps are already cached (edge to face maps currently aren't), the approach in this PR is much faster. So I think it's best to go with this for now. One reason it's better is probably because we end up reading from the face group ID varray much less. Nice job :) I left some style comments though.
@ -58,2 +55,2 @@
/* This edge is connected to two faces on different face sets. */
boundary[edge] = true;
Array<std::atomic_int> edge_states(mesh.totedge);
Member

Array<std::atomic<int>> unless you had a reason to do it this way.

`Array<std::atomic<int>>` unless you had a reason to do it this way.
mod_moder marked this conversation as resolved
@ -60,0 +55,4 @@
Array<std::atomic_int> edge_states(mesh.totedge);
/* State is index of face or one of invalid values: */
const static constexpr int no_face_yet = -1;
Member

A simple constexpr int seems fine to me

A simple `constexpr int` seems fine to me
Author
Member

This requered to use this as case in switch statment.

This requered to use this as case in `switch` statment.
Member

Ah, both const and static? I thought const was redundant with constexpr anyway

Ah, both `const` and `static`? I thought `const` was redundant with `constexpr` anyway
Author
Member

Well, static at least, maybe const can be omited..

Well, `static` at least, maybe `const` can be omited..
mod_moder marked this conversation as resolved
@ -60,0 +56,4 @@
Array<std::atomic_int> edge_states(mesh.totedge);
/* State is index of face or one of invalid values: */
const static constexpr int no_face_yet = -1;
const static constexpr int no_more_face = -2;
Member

Maybe no_more_face -> is_boundary

Maybe `no_more_face` -> `is_boundary`
mod_moder marked this conversation as resolved
@ -60,0 +64,4 @@
}
});
const GroupedSpan<int> face_edges(mesh.face_offsets(), mesh.corner_edges());
Member

I usually reserve face_edges for the variable of the edge indices of a specific face. Keeping corner_edges.slice(faces[face_i]) inline for consistency seems better to me. Though I don't have a strong opinion.

I usually reserve `face_edges` for the variable of the edge indices of a specific face. Keeping `corner_edges.slice(faces[face_i])` inline for consistency seems better to me. Though I don't have a strong opinion.
mod_moder marked this conversation as resolved
@ -60,0 +71,4 @@
for (const int edge_i : face_edges[face_i]) {
std::atomic_int &edge_state = edge_states[edge_i];
while (true) {
int edge_state_v = edge_state.load(std::memory_order_relaxed);
Member

What does _v mean?

What does `_v` mean?
Author
Member

edge_state_value

`edge_state_value`
mod_moder marked this conversation as resolved
Iliya Katushenock requested review from Hans Goudey 2023-11-27 13:44:05 +01:00
Iliya Katushenock added 2 commits 2023-11-27 13:44:21 +01:00
Hans Goudey approved these changes 2023-11-28 16:43:28 +01:00
Member

@blender-bot build

@blender-bot build
Hans Goudey added 1 commit 2023-12-04 23:20:54 +01:00
buildbot/vexp-code-patch-coordinator Build done. Details
fedda5bdeb
Merge branch 'main' into speedup_face_group_boundaries
Member

@blender-bot build

@blender-bot build
Hans Goudey merged commit 5429e006c8 into main 2023-12-05 14:45:15 +01:00
Iliya Katushenock deleted branch speedup_face_group_boundaries 2023-12-05 14:48:51 +01:00
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#115138
No description provided.