Mesh: Rewrite split edges algorithm #110661
No reviewers
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#110661
Loading…
Reference in New Issue
No description provided.
Delete Branch "HooglyBoogly/blender:split-edges-rewrite-2"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The new split edges code from
e83f46ea76
started the use ofMesh rather than BMesh as a performance improvement. However, the code
contained a complex loop used for de-duplicating edges that gave invalid
indices in some circumstances. That was fixed by
b226c115e2
,but in turn that commit caused loose edges to be skipped. That was
fixed by
111e378366
, however that caused another crash.Long story short, the code is quite complex and the existing algorithm,
while an improvement from the original, is fiddly. For example, instead
of working with just the input and output states, it adjusted topology
data for an intermediate state halfway through the change.
Fixes #109236
Goals
with future topology caching improvements (See !107816).
and separating functions.
edge attribute propagation.
Timing
I tested performance with a Ryzen 7950x and a grid with 4 million faces
with different random portions of the mesh selected.
vert_selection_from_edge
selection_to_bit_vector
vert_to_corner_map
edge_to_corner_map
corner_to_face_map
calc_all_corner_groups
calc_vert_ranges_per_old_vert
update_corner_verts
calc_new_edges
update_unselected_edges
propagate_edge_attributes
propagate_vert_attributes
propagate_vert_attributes
Tests
New regression tests will be committed separately. Existing tests
must be updated since the new code gives different indices.
@blender-bot build
It passed all of the test I could think of to throw at it, so I'm fairly confident the code is correct. Code was very readable, just have a few comments here and there
@ -74,3 +69,3 @@
continue;
}
if (id.is_anonymous() && !propagation_info.propagate(id.anonymous_id())) {
if (id.name() == ".edge_verts") {
I assume this skips updating the vertex index information for the edges, since this is handled separately? Maybe worth mentioning this in a comment.
@ -159,3 +107,1 @@
const Span<Vector<int>> edge_to_loop_map,
MutableSpan<int> corner_verts,
MutableSpan<int> new_to_old_verts_map)
static BitVector<> selection_to_bit_vector(const IndexMask &selection, const int universe_size)
Is
universe_size
standard terminology for bit vectors? Isn't this always going to beselection.size()
?@ -294,3 +195,1 @@
void split_edges(Mesh &mesh,
const IndexMask &mask,
const bke::AnonymousAttributePropagationInfo &propagation_info)
/* Calculate groups of edges that are contiguously connected to each input vertex. */
In the comment above
calc_corner_fans_for_vertex
a fan is referred to as a group of corners. Here it is referred to as a group of edges. This should probably say "Calculate groups of corners that are..."@ -311,1 +220,3 @@
vert_to_edge_map[edges[i][1]].append(i);
/** Selected and unselected loose edges attached to a vertex. */
struct VertLooseEdges {
Vector<int> split;
Took me a while to realize that
split
is the same asselected
. Maybe just change this toselected
to be more consistent with the comment above.@ -343,1 +246,3 @@
threading::parallel_for(edges.index_range(), 512, [&](const IndexRange range) {
/**
* Every affected vertex maps to potentially multiple output vertices. Create a mapping from
* affected vertex index to the of output vertex (indices are to those sets, not indices in arrays
"affected vertex index to the of output vertex" -> "affected vertex index to the index of the output vertex"
@ -344,0 +262,4 @@
{
offset_data.reinitialize(affected_verts.size() + 1);
threading::parallel_for(affected_verts.index_range(), 2048, [&](const IndexRange range) {
offset_data.as_mutable_span().slice(range).fill(-1);
Is it better to do this here, than before the loop? Also, why does it need to be filled with
-1
? The comment above the function mentions nothing about it possibly containing invalid indices. Looking at the code, this also seems to be impossible. Maybe this was from a previous version of the code?I did the initialization in the loop to use the cache better, as opposed to filling the entire array on one thread first. Not sure how much of a difference that makes, but it shouldn't hurt.
The -1 is to account for the reused vert, I added a comment.
Ok, I see now. That makes a lot of sense
@ -360,2 +284,2 @@
vert_to_edge_map[edge[0]].append(duplicate_i);
vert_to_edge_map[edge[1]].append(duplicate_i);
/**
* Update corner verts so that each fan of edges gets its own vertex. For the last "new vertex" we
"fan of edges" -> "fan of corners"
@ -427,0 +457,4 @@
Array<int> vert_to_corner_offsets;
Array<int> vert_to_corner_indices;
const GroupedSpan<int> vert_to_corner_map = bke::mesh::build_vert_to_loop_map(
These functions take up a significant part of the total runtime of the node. Have you tried looking into invoking them in parallel, or are all the threads already saturated by the functions individually? Of course, these might just become look-ups if a topology cache is available
Yes, I hope this can be improved in the future. Currently their work is mostly single threaded, but I'd like to change that (and Iliya looked into that here: !110707). If they are multithreaded, running them in parallel won't be as helpful. But definitely worth looking into this sort of thing!
Probably best to keep that for a separate patch indeed, since this is really supposed to be a bugfix
@ -444,2 +533,4 @@
BKE_mesh_tag_edges_split(&mesh);
BLI_assert(BKE_mesh_is_valid(&mesh));
Is this just here for testing purposes? I don't know how expensive that function is, but it should probably be removed if it is not a very trivial check
Would be good to test the code coverage of all the new regression tests.
@ -197,2 +132,4 @@
return other;
}
using CornerFan = Vector<int>;
Not super important right now, because fans are usually not huge (would be nice to have a comment here for what size is typical). It seems like it might not be too hard to have a more efficient flat data structure for all corner fans.
The most tricky thing here is probably that the number of fans per vertex is not known it advance. Just a thought. Alternatively, maybe a threadlocal
LinearAllocator
could help, but I don't know how much overhead you get from using vectors here anyway. Can be looked into later.Yeah, I was thinking about that too. There are definitely easier possible performance improvements than this though-- for now I think the more basic storage types help keep the code simple.
NoInitialization()
was pretty essential though.@ -198,2 +134,4 @@
using CornerFan = Vector<int>;
/**
Is the set of corner fans deterministic even if the
connected_corners
input is shuffled? Might be ok if it's not, just wondering.The sets will contain the same corners, but the order of fans and corners in each fan may be different. Luckily the order doesn't matter for this algorithm. It would be nice to have a version of this function that used winding order though-- that could be used in corner normal calculation maybe.
@ -344,0 +262,4 @@
const BitSpan split_edges,
Array<int> &offset_data)
{
offset_data.reinitialize(affected_verts.size() + 1);
Maybe locally add
MutableSpan<int> new_verts_nums = offset_data;
.@ -344,0 +264,4 @@
{
offset_data.reinitialize(affected_verts.size() + 1);
threading::parallel_for(affected_verts.index_range(), 2048, [&](const IndexRange range) {
/* Start with -1 for the reused vertex. None of the final sizes should be negative. */
Add assert for the second sentence.
accumulate_counts_to_offsets
already has an assert for that@ -344,0 +265,4 @@
offset_data.reinitialize(affected_verts.size() + 1);
threading::parallel_for(affected_verts.index_range(), 2048, [&](const IndexRange range) {
/* Start with -1 for the reused vertex. None of the final sizes should be negative. */
offset_data.as_mutable_span().slice(range).fill(-1);
Seems like it could be easier and faster to just subtract 1 in the loop below and change
+=
to=
?Maybe, but then the loose edges and corner fan cases become dependent on each other, since they can only subtract 1 if the other isn't empty. So this is a bit simpler.
Hm, I don't get why first setting everything to
-1
and then adding something to every element afterwards is any better than just doing this:Because we also need need to reuse vertices for loose edges:
Both of those can't use
- 1
if any of the previous cases did. Rather than keeping track of whether we've already reused the vertex across the types, I thought it was simpler to just always add and start from -1.@ -366,0 +313,4 @@
* indices. Using this fact we can avoid the need to use a more expensive data structure to
* deduplicate the result edges.
*/
static BitVector<> calc_merged_new_edges(const int orig_verts_num,
I find the term
merged
confusing. Are we actually merging something or are we just not splitting these edges?@ -373,0 +348,4 @@
const BitSpan output_single_edge,
MutableSpan<int2> edges,
MutableSpan<int> corner_edges,
Vector<int2> &new_edges,
Should use the
r_
prefix.@ -383,2 +376,3 @@
}
});
}
return new_edges;
This might result in an unnecessary copy currently.
Oops, thanks!
would still like to see the test coverge, but looks good to me now
Forgot to press this button 😅
Thank you both for the review! Before committing, I'll rename "fan" to "group" and mention that they're just "Groups of corners connected by edges" in the comments.
I found a type of edge that broke my assumption about when duplicates were created. This shows that for some edges, some new faces will get duplicate edges and others won't. I still think a heuristic for this must exist. Maybe its whether the corner vertices of each connected face are unchanged or not. I'll check on that tomorrow.
Do you want to get another review here after your latest changes?
Thanks, not at the moment, I still have one more thing to finish.
Okay, it's working in my tests now. @wannes.malfait or @JacquesLucke, if you're interested in looking at my most recent changes, the new stuff is in
calc_new_edges
andupdate_unselected_edges
. The basic change is now it deduplicates the new edges grouped by their original edge, so all that work can happen in parallel. There's some complexity from optimizing for the case where no deduplication happens though, and I realized that unselected edges can change too, so there's a simple function that does that.Took me a while to understand everything, but I think I got the big picture now.
Some small comments, as well as curiosities.
BTW, did you update the patch description/timings yet? If so, how do they compare to the previous version of this patch? Curious to know how this approach compares to the other.
@ -332,0 +340,4 @@
const IndexMask &selected_edges,
MutableSpan<int2> edges,
MutableSpan<int> corner_edges,
MutableSpan<int> r_new_edge_offsets)
Took me a while to realize what
new_edge_offsets
was storing. Something likenum_new_edges_per_old_edge
would be clearer to me (but also too long probably). I'm fine if you leave it as is, since the current name does make sense as well in hindsight. What confused me was that "offset" is also used in other arrays likeoffset_no_merge
ormerged_offsets
in a different way (cumulative offsets).@ -343,3 +352,1 @@
threading::parallel_for(edges.index_range(), 512, [&](const IndexRange range) {
for (const int i : range) {
edge_to_loop_map[i].extend(orig_edge_to_loop_map[i]);
offsets_no_merge.last() = no_merge_count;
I was not able to figure out why this line is there. I see that this is different from what is set in the loop, but it seems wrong to me to have this here:
last()
is not necessarily a selected edgeProbably there's something I'm not getting
Ah, sorry, the extra element is left over from when I was using
OffsetIndices
with this array (which requires the extra).@ -346,0 +353,4 @@
Array<int2> no_merge_new_edges(no_merge_count);
/* The first new edge for each selected edge is reused, instead we modify the existing edge in
Something seems wrong with this comment, the "instead" doesn't make sense to me.
Right, good point, "instead" is weird.
@ -359,3 +403,1 @@
for (const int duplicate_i : IndexRange(edge_offsets[edge_i], num_edge_duplicates[edge_i])) {
vert_to_edge_map[edge[0]].append(duplicate_i);
vert_to_edge_map[edge[1]].append(duplicate_i);
if (merged_offsets.total_size() == no_merge_new_edges.size()) {
Is this really the earliest we can conclude this? I think it should be possible to do something like this:
bool deduplicated = false;
before the deduplication loopadd_edge_or_find_index
detects a duplicate (I think that is what it does), setdeduplicated = true
. I think this should be fine in terms of thread safety, since it doesn't matter if multiple threads want to set it totrue
.deduplicated == false
to know if we have an early exit. If so, we skip the accumulation and loop just above this line.What do you think?
Hmm, I'm not sure that lets us skip any meaningful work, since when there are duplicates, we still need both the un-merged offsets and the merged offsets in order to remove the empty slots in the array that are unused because of the deduplicate. We could redo some of the work in the first loop to find the second offsets array, but I'm not sure that's worth it.
Maybe I'm missing something from your idea though.
I was thinking about optimizing the "no duplicates" case a bit more. If this is the case, we can just return the
no_merge_new_edges
array. Currently, this is detected by checking ifmerged_offsets.total_size() == no_merge_new_edges.size()
. My idea is that we could detect this, before calculatingmerged_offsets
, using the method in my comment. That way, when there are no duplicates, we skip this calculation ofmerged_offsets
. If there are duplicates, then we do still need to calculate it of course. Does that make sense?Thanks, I tried this and it is indeed faster when there are no merges (only in the 100% test case, but probably most real world cases too). In that it changed the timing from 56 to 35 ms.
@ -425,2 +491,2 @@
corner_edges[corner] = new_edges.index_of_or_add_as(OrderedEdge(vert_1, vert_2));
}
/**
* Transform the #OffsetIndices storage of new vertices per source vertex into a more
This is also used for edges, so using
vert
in the comment and variable name is maybe not the best idea. Maybe a genericelement
would be good enough?Thanks for the quick feedback! I updated both the old and new timings in the comparison. I found the old algorithm to be a bit slower than last time (it was consistent, but I'm not quite sure why), and the new one to be faster than last time. I'd attribute that mostly to multithreading in topology map creation that landed yesterday.
I'll run the updated per-function timings now.
LGTM, and works well in my tests.
This is outdated now, so don't forget to change it before committing 🙂.