Mesh: Rewrite split edges algorithm #110661

Merged
Hans Goudey merged 78 commits from HooglyBoogly/blender:split-edges-rewrite-2 into main 2023-08-30 14:23:49 +02:00
Member

The new split edges code from e83f46ea76 started the use of
Mesh rather than BMesh as a performance improvement. However, the code
contained a complex loop used for de-duplicating edges that gave invalid
indices in some circumstances. That was fixed by b226c115e2,
but in turn that commit caused loose edges to be skipped. That was
fixed by 111e378366, however that caused another crash.

Long story short, the code is quite complex and the existing algorithm,
while an improvement from the original, is fiddly. For example, instead
of working with just the input and output states, it adjusted topology
data for an intermediate state halfway through the change.

Fixes #109236

Goals

  • Only use topology maps from the input mesh. This should work better
    with future topology caching improvements (See !107816).
  • Run de-duplication in a per-edge context to allow parallelization.
  • Optimize for real world use cases when there is merging of new edges.
  • Improve readability by making each step's inputs clear, improving naming,
    and separating functions.
  • Improve handling of loose edges.
  • Reuse existing edges for the first new split edges, simplifying
    edge attribute propagation.

Timing
I tested performance with a Ryzen 7950x and a grid with 4 million faces
with different random portions of the mesh selected.

Selection Before (ms) After (ms) Change
100% 1869.4 351.8 5.3x
50% 1636.0 356.0 4.6x
10% 1148.8 238.2 4.8x
4 Million Face Grid 10% 50% 100%
vert_selection_from_edge 2.7 4.1 4.3
selection_to_bit_vector 0.7 3.9 0.2
vert_to_corner_map 22.3 22.5 22.4
edge_to_corner_map 47.7 47.6 47.6
corner_to_face_map 3.3 3.3 3.3
calc_all_corner_groups 21.0 44.1 38.3
calc_vert_ranges_per_old_vert 3.0 7.4 7.9
update_corner_verts 2.8 12.7 15.5
calc_new_edges 28.8 44.1 35.5
update_unselected_edges 9.8 9.3 0.0
propagate_edge_attributes 35.7 51.3 61.7
propagate_vert_attributes 24.0 27.0 31.7
propagate_vert_attributes 24.2 29.7 36.9

Tests
New regression tests will be committed separately. Existing tests
must be updated since the new code gives different indices.

The new split edges code from e83f46ea7630163ae836 started the use of Mesh rather than BMesh as a performance improvement. However, the code contained a complex loop used for de-duplicating edges that gave invalid indices in some circumstances. That was fixed by b226c115e263afbb1c2d, but in turn that commit caused loose edges to be skipped. That was fixed by 111e378366e389fda300, however that caused another crash. Long story short, the code is quite complex and the existing algorithm, while an improvement from the original, is fiddly. For example, instead of working with just the input and output states, it adjusted topology data for an intermediate state halfway through the change. Fixes #109236 **Goals** - Only use topology maps from the input mesh. This should work better with future topology caching improvements (See !107816). - Run de-duplication in a per-edge context to allow parallelization. - Optimize for real world use cases when there is merging of new edges. - Improve readability by making each step's inputs clear, improving naming, and separating functions. - Improve handling of loose edges. - Reuse existing edges for the first new split edges, simplifying edge attribute propagation. **Timing** I tested performance with a Ryzen 7950x and a grid with 4 million faces with different random portions of the mesh selected. | Selection | Before (ms) | After (ms) | Change | | --------- | ----------- | ---------- | ------ | | 100% | 1869.4 | 351.8 | 5.3x | | 50% | 1636.0 | 356.0 | 4.6x | | 10% | 1148.8 | 238.2 | 4.8x | | 4 Million Face Grid | 10% | 50% | 100% | | ------------------------------- | ---- | ---- | ---- | | `vert_selection_from_edge` | 2.7 | 4.1 | 4.3 | | `selection_to_bit_vector` | 0.7 | 3.9 | 0.2 | | `vert_to_corner_map` | 22.3 | 22.5 | 22.4 | | `edge_to_corner_map` | 47.7 | 47.6 | 47.6 | | `corner_to_face_map` | 3.3 | 3.3 | 3.3 | | `calc_all_corner_groups` | 21.0 | 44.1 | 38.3 | | `calc_vert_ranges_per_old_vert` | 3.0 | 7.4 | 7.9 | | `update_corner_verts` | 2.8 | 12.7 | 15.5 | | `calc_new_edges` | 28.8 | 44.1 | 35.5 | | `update_unselected_edges` | 9.8 | 9.3 | 0.0 | | `propagate_edge_attributes` | 35.7 | 51.3 | 61.7 | | `propagate_vert_attributes` | 24.0 | 27.0 | 31.7 | | `propagate_vert_attributes` | 24.2 | 29.7 | 36.9 | **Tests** New regression tests will be committed separately. Existing tests must be updated since the new code gives different indices.
Hans Goudey added 36 commits 2023-07-31 22:48:31 +02:00
Hans Goudey added this to the Nodes & Physics project 2023-07-31 23:05:40 +02:00
Hans Goudey added the
Module
Nodes & Physics
Interest
Modeling
labels 2023-07-31 23:05:54 +02:00
Hans Goudey added 2 commits 2023-08-01 21:39:43 +02:00
Hans Goudey requested review from Jacques Lucke 2023-08-01 21:43:37 +02:00
Hans Goudey requested review from Wannes Malfait 2023-08-01 21:43:37 +02:00
Hans Goudey added 1 commit 2023-08-01 22:06:35 +02:00
buildbot/vexp-code-patch-coordinator Build done. Details
e1537744cd
Use noinitialization when constructing corner fans
Saves about 100ms out of 550
Author
Member

@blender-bot build

@blender-bot build
Wannes Malfait reviewed 2023-08-02 14:33:38 +02:00
Wannes Malfait left a comment
Member

It passed all of the test I could think of to throw at it, so I'm fairly confident the code is correct. Code was very readable, just have a few comments here and there

It passed all of the test I could think of to throw at it, so I'm fairly confident the code is correct. Code was very readable, just have a few comments here and there
@ -74,3 +69,3 @@
continue;
}
if (id.is_anonymous() && !propagation_info.propagate(id.anonymous_id())) {
if (id.name() == ".edge_verts") {
Member

I assume this skips updating the vertex index information for the edges, since this is handled separately? Maybe worth mentioning this in a comment.

I assume this skips updating the vertex index information for the edges, since this is handled separately? Maybe worth mentioning this in a comment.
HooglyBoogly marked this conversation as resolved
@ -159,3 +107,1 @@
const Span<Vector<int>> edge_to_loop_map,
MutableSpan<int> corner_verts,
MutableSpan<int> new_to_old_verts_map)
static BitVector<> selection_to_bit_vector(const IndexMask &selection, const int universe_size)
Member

Is universe_size standard terminology for bit vectors? Isn't this always going to be selection.size()?

Is `universe_size` standard terminology for bit vectors? Isn't this always going to be `selection.size()`?
HooglyBoogly marked this conversation as resolved
@ -294,3 +195,1 @@
void split_edges(Mesh &mesh,
const IndexMask &mask,
const bke::AnonymousAttributePropagationInfo &propagation_info)
/* Calculate groups of edges that are contiguously connected to each input vertex. */
Member

In the comment above calc_corner_fans_for_vertex a fan is referred to as a group of corners. Here it is referred to as a group of edges. This should probably say "Calculate groups of corners that are..."

In the comment above `calc_corner_fans_for_vertex` a fan is referred to as a group of corners. Here it is referred to as a group of edges. This should probably say "Calculate groups of **corners** that are..."
HooglyBoogly marked this conversation as resolved
@ -311,1 +220,3 @@
vert_to_edge_map[edges[i][1]].append(i);
/** Selected and unselected loose edges attached to a vertex. */
struct VertLooseEdges {
Vector<int> split;
Member

Took me a while to realize that split is the same as selected. Maybe just change this to selected to be more consistent with the comment above.

Took me a while to realize that `split` is the same as `selected`. Maybe just change this to `selected` to be more consistent with the comment above.
HooglyBoogly marked this conversation as resolved
@ -343,1 +246,3 @@
threading::parallel_for(edges.index_range(), 512, [&](const IndexRange range) {
/**
* Every affected vertex maps to potentially multiple output vertices. Create a mapping from
* affected vertex index to the of output vertex (indices are to those sets, not indices in arrays
Member

"affected vertex index to the of output vertex" -> "affected vertex index to the index of the output vertex"

"affected vertex index to the of output vertex" -> "affected vertex index to the index of the output vertex"
HooglyBoogly marked this conversation as resolved
@ -344,0 +262,4 @@
{
offset_data.reinitialize(affected_verts.size() + 1);
threading::parallel_for(affected_verts.index_range(), 2048, [&](const IndexRange range) {
offset_data.as_mutable_span().slice(range).fill(-1);
Member

Is it better to do this here, than before the loop? Also, why does it need to be filled with -1? The comment above the function mentions nothing about it possibly containing invalid indices. Looking at the code, this also seems to be impossible. Maybe this was from a previous version of the code?

Is it better to do this here, than before the loop? Also, why does it need to be filled with `-1`? The comment above the function mentions nothing about it possibly containing invalid indices. Looking at the code, this also seems to be impossible. Maybe this was from a previous version of the code?
Author
Member

I did the initialization in the loop to use the cache better, as opposed to filling the entire array on one thread first. Not sure how much of a difference that makes, but it shouldn't hurt.

The -1 is to account for the reused vert, I added a comment.

I did the initialization in the loop to use the cache better, as opposed to filling the entire array on one thread first. Not sure how much of a difference that makes, but it shouldn't hurt. The -1 is to account for the reused vert, I added a comment.
Member

Ok, I see now. That makes a lot of sense

Ok, I see now. That makes a lot of sense
wannes.malfait marked this conversation as resolved
@ -360,2 +284,2 @@
vert_to_edge_map[edge[0]].append(duplicate_i);
vert_to_edge_map[edge[1]].append(duplicate_i);
/**
* Update corner verts so that each fan of edges gets its own vertex. For the last "new vertex" we
Member

"fan of edges" -> "fan of corners"

"fan of edges" -> "fan of corners"
HooglyBoogly marked this conversation as resolved
@ -427,0 +457,4 @@
Array<int> vert_to_corner_offsets;
Array<int> vert_to_corner_indices;
const GroupedSpan<int> vert_to_corner_map = bke::mesh::build_vert_to_loop_map(
Member

These functions take up a significant part of the total runtime of the node. Have you tried looking into invoking them in parallel, or are all the threads already saturated by the functions individually? Of course, these might just become look-ups if a topology cache is available

These functions take up a significant part of the total runtime of the node. Have you tried looking into invoking them in parallel, or are all the threads already saturated by the functions individually? Of course, these might just become look-ups if a topology cache is available
Author
Member

Yes, I hope this can be improved in the future. Currently their work is mostly single threaded, but I'd like to change that (and Iliya looked into that here: !110707). If they are multithreaded, running them in parallel won't be as helpful. But definitely worth looking into this sort of thing!

Yes, I hope this can be improved in the future. Currently their work is _mostly_ single threaded, but I'd like to change that (and Iliya looked into that here: !110707). If they are multithreaded, running them in parallel won't be as helpful. But definitely worth looking into this sort of thing!
Member

Probably best to keep that for a separate patch indeed, since this is really supposed to be a bugfix

Probably best to keep that for a separate patch indeed, since this is really supposed to be a bugfix
wannes.malfait marked this conversation as resolved
@ -444,2 +533,4 @@
BKE_mesh_tag_edges_split(&mesh);
BLI_assert(BKE_mesh_is_valid(&mesh));
Member

Is this just here for testing purposes? I don't know how expensive that function is, but it should probably be removed if it is not a very trivial check

Is this just here for testing purposes? I don't know how expensive that function is, but it should probably be removed if it is not a very trivial check
HooglyBoogly marked this conversation as resolved
Hans Goudey added 8 commits 2023-08-02 15:46:52 +02:00
Jacques Lucke requested changes 2023-08-02 16:17:22 +02:00
Jacques Lucke left a comment
Member

Would be good to test the code coverage of all the new regression tests.

Would be good to test the code coverage of all the new regression tests.
@ -197,2 +132,4 @@
return other;
}
using CornerFan = Vector<int>;
Member

Not super important right now, because fans are usually not huge (would be nice to have a comment here for what size is typical). It seems like it might not be too hard to have a more efficient flat data structure for all corner fans.

struct CornerFans {
  Span<int> corners; 
  OffsetIndices<int> corners_by_fan; 
  OffsetIndices<int> fans_by_vertex;
};

The most tricky thing here is probably that the number of fans per vertex is not known it advance. Just a thought. Alternatively, maybe a threadlocal LinearAllocator could help, but I don't know how much overhead you get from using vectors here anyway. Can be looked into later.

Not super important right now, because fans are usually not huge (would be nice to have a comment here for what size is typical). It seems like it might not be too hard to have a more efficient flat data structure for all corner fans. ``` struct CornerFans { Span<int> corners; OffsetIndices<int> corners_by_fan; OffsetIndices<int> fans_by_vertex; }; ``` The most tricky thing here is probably that the number of fans per vertex is not known it advance. Just a thought. Alternatively, maybe a threadlocal `LinearAllocator` could help, but I don't know how much overhead you get from using vectors here anyway. Can be looked into later.
Author
Member

Yeah, I was thinking about that too. There are definitely easier possible performance improvements than this though-- for now I think the more basic storage types help keep the code simple. NoInitialization() was pretty essential though.

Yeah, I was thinking about that too. There are definitely easier possible performance improvements than this though-- for now I think the more basic storage types help keep the code simple. `NoInitialization()` was pretty essential though.
JacquesLucke marked this conversation as resolved
@ -198,2 +134,4 @@
using CornerFan = Vector<int>;
/**
Member

Is the set of corner fans deterministic even if the connected_corners input is shuffled? Might be ok if it's not, just wondering.

Is the set of corner fans deterministic even if the `connected_corners` input is shuffled? Might be ok if it's not, just wondering.
Author
Member

The sets will contain the same corners, but the order of fans and corners in each fan may be different. Luckily the order doesn't matter for this algorithm. It would be nice to have a version of this function that used winding order though-- that could be used in corner normal calculation maybe.

The sets will contain the same corners, but the order of fans and corners in each fan may be different. Luckily the order doesn't matter for this algorithm. It would be nice to have a version of this function that used winding order though-- that could be used in corner normal calculation maybe.
@ -344,0 +262,4 @@
const BitSpan split_edges,
Array<int> &offset_data)
{
offset_data.reinitialize(affected_verts.size() + 1);
Member

Maybe locally add MutableSpan<int> new_verts_nums = offset_data;.

Maybe locally add `MutableSpan<int> new_verts_nums = offset_data;`.
HooglyBoogly marked this conversation as resolved
@ -344,0 +264,4 @@
{
offset_data.reinitialize(affected_verts.size() + 1);
threading::parallel_for(affected_verts.index_range(), 2048, [&](const IndexRange range) {
/* Start with -1 for the reused vertex. None of the final sizes should be negative. */
Member

Add assert for the second sentence.

Add assert for the second sentence.
Author
Member

accumulate_counts_to_offsets already has an assert for that

`accumulate_counts_to_offsets` already has an assert for that
JacquesLucke marked this conversation as resolved
@ -344,0 +265,4 @@
offset_data.reinitialize(affected_verts.size() + 1);
threading::parallel_for(affected_verts.index_range(), 2048, [&](const IndexRange range) {
/* Start with -1 for the reused vertex. None of the final sizes should be negative. */
offset_data.as_mutable_span().slice(range).fill(-1);
Member

Seems like it could be easier and faster to just subtract 1 in the loop below and change += to =?

Seems like it could be easier and faster to just subtract 1 in the loop below and change `+=` to `=`?
Author
Member

Maybe, but then the loose edges and corner fan cases become dependent on each other, since they can only subtract 1 if the other isn't empty. So this is a bit simpler.

Maybe, but then the loose edges and corner fan cases become dependent on each other, since they can only subtract 1 if the other isn't empty. So this is a bit simpler.
Member

Hm, I don't get why first setting everything to -1 and then adding something to every element afterwards is any better than just doing this:

for (const int i : range) {
  /* -1 because the first vertex is reused. */
  new_verts_nums[i] = corner_fans[i].size() - 1;
}
Hm, I don't get why first setting everything to `-1` and then adding something to every element afterwards is any better than just doing this: ``` for (const int i : range) { /* -1 because the first vertex is reused. */ new_verts_nums[i] = corner_fans[i].size() - 1; } ```
Author
Member

Because we also need need to reuse vertices for loose edges:

new_verts_nums[mask] += info.selected.size();
new_verts_nums[mask] += info.unselected.size() > 0;

Both of those can't use - 1 if any of the previous cases did. Rather than keeping track of whether we've already reused the vertex across the types, I thought it was simpler to just always add and start from -1.

Because we also need need to reuse vertices for loose edges: ``` new_verts_nums[mask] += info.selected.size(); ``` ``` new_verts_nums[mask] += info.unselected.size() > 0; ``` Both of those can't use `- 1` if any of the previous cases did. Rather than keeping track of whether we've already reused the vertex across the types, I thought it was simpler to just always add and start from -1.
@ -366,0 +313,4 @@
* indices. Using this fact we can avoid the need to use a more expensive data structure to
* deduplicate the result edges.
*/
static BitVector<> calc_merged_new_edges(const int orig_verts_num,
Member

I find the term merged confusing. Are we actually merging something or are we just not splitting these edges?

I find the term `merged` confusing. Are we actually merging something or are we just not splitting these edges?
HooglyBoogly marked this conversation as resolved
@ -373,0 +348,4 @@
const BitSpan output_single_edge,
MutableSpan<int2> edges,
MutableSpan<int> corner_edges,
Vector<int2> &new_edges,
Member

Should use the r_ prefix.

Should use the `r_` prefix.
HooglyBoogly marked this conversation as resolved
@ -383,2 +376,3 @@
}
});
}
return new_edges;
Member

This might result in an unnecessary copy currently.

This might result in an unnecessary copy currently.
Author
Member

Oops, thanks!

Oops, thanks!
HooglyBoogly marked this conversation as resolved
Hans Goudey added 2 commits 2023-08-02 16:57:39 +02:00
Hans Goudey added 2 commits 2023-08-02 16:59:11 +02:00
Hans Goudey requested review from Jacques Lucke 2023-08-02 16:59:32 +02:00
Jacques Lucke approved these changes 2023-08-02 18:11:25 +02:00
Jacques Lucke left a comment
Member

would still like to see the test coverge, but looks good to me now

would still like to see the test coverge, but looks good to me now
Wannes Malfait approved these changes 2023-08-02 18:15:28 +02:00
Wannes Malfait left a comment
Member

Forgot to press this button 😅

Forgot to press this button 😅
Author
Member

Thank you both for the review! Before committing, I'll rename "fan" to "group" and mention that they're just "Groups of corners connected by edges" in the comments.

Thank you both for the review! Before committing, I'll rename "fan" to "group" and mention that they're just "Groups of corners connected by edges" in the comments.
Hans Goudey added 2 commits 2023-08-02 19:57:19 +02:00
Author
Member

image

I found a type of edge that broke my assumption about when duplicates were created. This shows that for some edges, some new faces will get duplicate edges and others won't. I still think a heuristic for this must exist. Maybe its whether the corner vertices of each connected face are unchanged or not. I'll check on that tomorrow.

![image](/attachments/40e4e213-87c1-420b-b6b2-ca5efbd9c1eb) I found a type of edge that broke my assumption about when duplicates were created. This shows that for some edges, some new faces will get duplicate edges and others won't. I still think a heuristic for this must exist. Maybe its whether the corner vertices of each connected face are unchanged or not. I'll check on that tomorrow.
Hans Goudey added 9 commits 2023-08-05 18:07:39 +02:00
Hans Goudey added 1 commit 2023-08-05 18:07:43 +02:00
Hans Goudey added 2 commits 2023-08-29 01:49:25 +02:00
Member

Do you want to get another review here after your latest changes?

Do you want to get another review here after your latest changes?
Author
Member

Thanks, not at the moment, I still have one more thing to finish.

Thanks, not at the moment, I still have one more thing to finish.
Hans Goudey added 6 commits 2023-08-29 21:19:08 +02:00
Author
Member

Okay, it's working in my tests now. @wannes.malfait or @JacquesLucke, if you're interested in looking at my most recent changes, the new stuff is in calc_new_edges and update_unselected_edges. The basic change is now it deduplicates the new edges grouped by their original edge, so all that work can happen in parallel. There's some complexity from optimizing for the case where no deduplication happens though, and I realized that unselected edges can change too, so there's a simple function that does that.

Okay, it's working in my tests now. @wannes.malfait or @JacquesLucke, if you're interested in looking at my most recent changes, the new stuff is in `calc_new_edges` and `update_unselected_edges`. The basic change is now it deduplicates the new edges grouped by their original edge, so all that work can happen in parallel. There's some complexity from optimizing for the case where no deduplication happens though, and I realized that unselected edges can change too, so there's a simple function that does that.
Wannes Malfait reviewed 2023-08-29 22:45:03 +02:00
Wannes Malfait left a comment
Member

Took me a while to understand everything, but I think I got the big picture now.

Some small comments, as well as curiosities.

BTW, did you update the patch description/timings yet? If so, how do they compare to the previous version of this patch? Curious to know how this approach compares to the other.

Took me a while to understand everything, but I think I got the big picture now. Some small comments, as well as curiosities. BTW, did you update the patch description/timings yet? If so, how do they compare to the previous version of this patch? Curious to know how this approach compares to the other.
@ -332,0 +340,4 @@
const IndexMask &selected_edges,
MutableSpan<int2> edges,
MutableSpan<int> corner_edges,
MutableSpan<int> r_new_edge_offsets)
Member

Took me a while to realize what new_edge_offsets was storing. Something like num_new_edges_per_old_edge would be clearer to me (but also too long probably). I'm fine if you leave it as is, since the current name does make sense as well in hindsight. What confused me was that "offset" is also used in other arrays like offset_no_merge or merged_offsets in a different way (cumulative offsets).

Took me a while to realize what `new_edge_offsets` was storing. Something like `num_new_edges_per_old_edge` would be clearer to me (but also too long probably). I'm fine if you leave it as is, since the current name does make sense as well in hindsight. What confused me was that "offset" is also used in other arrays like `offset_no_merge` or `merged_offsets` in a different way (cumulative offsets).
HooglyBoogly marked this conversation as resolved
@ -343,3 +352,1 @@
threading::parallel_for(edges.index_range(), 512, [&](const IndexRange range) {
for (const int i : range) {
edge_to_loop_map[i].extend(orig_edge_to_loop_map[i]);
offsets_no_merge.last() = no_merge_count;
Member

I was not able to figure out why this line is there. I see that this is different from what is set in the loop, but it seems wrong to me to have this here:

  • last() is not necessarily a selected edge
  • I thought the offsets should point to the index the new edge would be located at. Now the last one is pointing out of bounds?

Probably there's something I'm not getting

I was not able to figure out why this line is there. I see that this is different from what is set in the loop, but it seems wrong to me to have this here: - `last()` is not necessarily a selected edge - I thought the offsets should point to the index the new edge would be located at. Now the last one is pointing out of bounds? Probably there's something I'm not getting
Author
Member

Ah, sorry, the extra element is left over from when I was using OffsetIndices with this array (which requires the extra).

Ah, sorry, the extra element is left over from when I was using `OffsetIndices` with this array (which requires the extra).
HooglyBoogly marked this conversation as resolved
@ -346,0 +353,4 @@
Array<int2> no_merge_new_edges(no_merge_count);
/* The first new edge for each selected edge is reused, instead we modify the existing edge in
Member

Something seems wrong with this comment, the "instead" doesn't make sense to me.

Something seems wrong with this comment, the "instead" doesn't make sense to me.
Author
Member

Right, good point, "instead" is weird.

Right, good point, "instead" is weird.
HooglyBoogly marked this conversation as resolved
@ -359,3 +403,1 @@
for (const int duplicate_i : IndexRange(edge_offsets[edge_i], num_edge_duplicates[edge_i])) {
vert_to_edge_map[edge[0]].append(duplicate_i);
vert_to_edge_map[edge[1]].append(duplicate_i);
if (merged_offsets.total_size() == no_merge_new_edges.size()) {
Member

Is this really the earliest we can conclude this? I think it should be possible to do something like this:

  • Add bool deduplicated = false; before the deduplication loop
  • The moment add_edge_or_find_index detects a duplicate (I think that is what it does), set deduplicated = true. I think this should be fine in terms of thread safety, since it doesn't matter if multiple threads want to set it to true.
  • At the end of that loop we simply check for deduplicated == false to know if we have an early exit. If so, we skip the accumulation and loop just above this line.

What do you think?

Is this really the earliest we can conclude this? I think it should be possible to do something like this: - Add `bool deduplicated = false;` before the deduplication loop - The moment `add_edge_or_find_index` detects a duplicate (I think that is what it does), set `deduplicated = true`. I think this should be fine in terms of thread safety, since it doesn't matter if multiple threads want to set it to `true`. - At the end of that loop we simply check for `deduplicated == false` to know if we have an early exit. If so, we skip the accumulation and loop just above this line. What do you think?
Author
Member

Hmm, I'm not sure that lets us skip any meaningful work, since when there are duplicates, we still need both the un-merged offsets and the merged offsets in order to remove the empty slots in the array that are unused because of the deduplicate. We could redo some of the work in the first loop to find the second offsets array, but I'm not sure that's worth it.

Maybe I'm missing something from your idea though.

Hmm, I'm not sure that lets us skip any meaningful work, since when there are duplicates, we still need both the un-merged offsets and the merged offsets in order to remove the empty slots in the array that are unused because of the deduplicate. We could redo some of the work in the first loop to find the second offsets array, but I'm not sure that's worth it. Maybe I'm missing something from your idea though.
Member

I was thinking about optimizing the "no duplicates" case a bit more. If this is the case, we can just return the no_merge_new_edges array. Currently, this is detected by checking if merged_offsets.total_size() == no_merge_new_edges.size(). My idea is that we could detect this, before calculating merged_offsets, using the method in my comment. That way, when there are no duplicates, we skip this calculation of merged_offsets. If there are duplicates, then we do still need to calculate it of course. Does that make sense?

I was thinking about optimizing the "no duplicates" case a bit more. If this is the case, we can just return the `no_merge_new_edges` array. Currently, this is detected by checking if `merged_offsets.total_size() == no_merge_new_edges.size()`. My idea is that we could detect this, before calculating `merged_offsets`, using the method in my comment. That way, when there are no duplicates, we skip this calculation of `merged_offsets`. If there are duplicates, then we do still need to calculate it of course. Does that make sense?
Author
Member

Thanks, I tried this and it is indeed faster when there are no merges (only in the 100% test case, but probably most real world cases too). In that it changed the timing from 56 to 35 ms.

Thanks, I tried this and it is indeed faster when there are no merges (only in the 100% test case, but probably most real world cases too). In that it changed the timing from 56 to 35 ms.
HooglyBoogly marked this conversation as resolved
@ -425,2 +491,2 @@
corner_edges[corner] = new_edges.index_of_or_add_as(OrderedEdge(vert_1, vert_2));
}
/**
* Transform the #OffsetIndices storage of new vertices per source vertex into a more
Member

This is also used for edges, so using vert in the comment and variable name is maybe not the best idea. Maybe a generic element would be good enough?

This is also used for edges, so using `vert` in the comment and variable name is maybe not the best idea. Maybe a generic `element` would be good enough?
HooglyBoogly marked this conversation as resolved
Hans Goudey added 5 commits 2023-08-29 23:15:56 +02:00
Author
Member

Thanks for the quick feedback! I updated both the old and new timings in the comparison. I found the old algorithm to be a bit slower than last time (it was consistent, but I'm not quite sure why), and the new one to be faster than last time. I'd attribute that mostly to multithreading in topology map creation that landed yesterday.

I'll run the updated per-function timings now.

Thanks for the quick feedback! I updated both the old and new timings in the comparison. I found the old algorithm to be a bit slower than last time (it was consistent, but I'm not quite sure why), and the new one to be faster than last time. I'd attribute that mostly to multithreading in topology map creation that landed yesterday. I'll run the updated per-function timings now.
Member

LGTM, and works well in my tests.

LGTM, and works well in my tests.
Wannes Malfait approved these changes 2023-08-30 14:04:10 +02:00
Wannes Malfait left a comment
Member

Avoid costly edge deduplication by using available information to
avoid creating duplicate edges.

This is outdated now, so don't forget to change it before committing 🙂.

> Avoid costly edge deduplication by using available information to avoid creating duplicate edges. This is outdated now, so don't forget to change it before committing 🙂.
Hans Goudey added 2 commits 2023-08-30 14:06:16 +02:00
Hans Goudey merged commit f41dc90925 into main 2023-08-30 14:23:49 +02:00
Hans Goudey deleted branch split-edges-rewrite-2 2023-08-30 14:23:51 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#110661
No description provided.