Sculpt: Improve Expand performance #120125

Raul Fernandez Hernandez · 2024-04-01T03:01:08+02:00

2024-04-01 03:01:08 +02:00

Improved performance of the geodesic expand operator initialization

Implemented a parallel Breath First Search to calculate surface
geodesics instead of a serial BFS. The performance is significantly
improved and the difference will be more evident the denser the mesh is.

Novel in this approach is a fast lock free, wait free parallel storage open addressing
with linear probing in an vector that surpasses tbb::concurrent_vector's
performance since is a critical component for performance.

Resolves #102586

For some numerical comparisons:

In monoHD.blend with 2016578 vertices on a MBA M1 8GB RAM
Initialization time:
-Before: 6.56252 seconds
-After: 2.63308 seconds

Improved performance of the geodesic expand operator initialization Implemented a parallel Breath First Search to calculate surface geodesics instead of a serial BFS. The performance is significantly improved and the difference will be more evident the denser the mesh is. Novel in this approach is a fast lock free, wait free parallel storage open addressing with linear probing in an vector that surpasses tbb::concurrent_vector's performance since is a critical component for performance. Resolves #102586 For some numerical comparisons: In monoHD.blend with 2016578 vertices on a MBA M1 8GB RAM Initialization time: -Before: 6.56252 seconds -After: 2.63308 seconds

monoHD.blend

187 MiB

AfterParallelBFS.mkv

1.8 MiB

mono.blend

13 MiB

BeforeParallelBFS.mkv

1.6 MiB

Raul Fernandez Hernandez added 18 commits 2024-04-01 03:01:17 +02:00

profiling 777c13403b

wip 9ec6d03313

optimized expand sperical falloff 62b8e560ec

wip a2cf59887f

wip ab89afd245

Merge branch 'main' into improve-expand-perf a63020f760

wip 4f97cd736b

improved parallel BFS a05832207f

wip 9523077ee7

wip 826a7de2ba

unnesting code 048dbd504a

parial threading conversion 646b2acc38

wip bf91c0b540

wip 60f9f752ba

temporal a91845fdc0

Merge branch 'main' into improve-expand-perf 4bee64f9b2

cleanup 3e3eda1cfb

wip cd0f89c36a

Raul Fernandez Hernandez commented

2024-04-01 03:03:38 +02:00

Performance scales better now with number of cores and topological mesh density
Soon will post comparison numbers before and after the change.

Performance scales better now with number of cores and topological mesh density Soon will post comparison numbers before and after the change.

Raul Fernandez Hernandez self-assigned this 2024-04-01 03:03:55 +02:00

Raul Fernandez Hernandez added this to the Module: Sculpt, Paint & Texture project 2024-04-01 03:04:06 +02:00

Raul Fernandez Hernandez added the

Module

Sculpt, Paint & Texture

label 2024-04-01 03:04:21 +02:00

Raul Fernandez Hernandez changed title from ~~WIP: Fix ##102586 Sculpt Mode: Improve Expand performance~~ to WIP: Fix #102586 Sculpt Mode: Improve Expand performance

2024-04-01 03:07:50 +02:00

Raul Fernandez Hernandez added 1 commit 2024-04-01 03:24:18 +02:00

simplification 2fbc863cfc

Raul Fernandez Hernandez added 1 commit 2024-04-01 20:51:13 +02:00

concurrent writting experiments 0402a16a9c

Raul Fernandez Hernandez added 1 commit 2024-04-01 22:36:06 +02:00

more wip d681aa9f5a

Raul Fernandez Hernandez added 1 commit 2024-04-04 01:26:28 +02:00

cleanup 41e041a266

Raul Fernandez Hernandez added 1 commit 2024-04-04 04:30:32 +02:00

samll cleanup f1ed7ac4ae

Raul Fernandez Hernandez added 1 commit 2024-04-04 22:05:11 +02:00

Merge branch 'main' into improve-expand-perf 6c7d2cb02e

Raul Fernandez Hernandez added 1 commit 2024-04-05 01:20:12 +02:00

wait free, lock free BFS 620071716a

Raul Fernandez Hernandez added 1 commit 2024-04-05 04:09:46 +02:00

cleanup cf8a908513

Raul Fernandez Hernandez added 1 commit 2024-04-05 04:20:39 +02:00

nit 4f8712a0c4

Raul Fernandez Hernandez added 1 commit 2024-04-05 04:22:55 +02:00

nit 480b5dfe57

Raul Fernandez Hernandez changed title from ~~WIP: Fix #102586 Sculpt Mode: Improve Expand performance~~ to Fix #102586 Sculpt Mode: Improve Expand performance

2024-04-05 04:52:18 +02:00

Raul Fernandez Hernandez requested review from Brecht Van Lommel 2024-04-05 04:52:46 +02:00

Raul Fernandez Hernandez requested review from Sergey Sharybin 2024-04-05 04:52:46 +02:00

Raul Fernandez Hernandez requested review from Hans Goudey 2024-04-05 04:52:47 +02:00

Raul Fernandez Hernandez commented

2024-04-05 04:54:30 +02:00

Disregard the many intermediate commits I've made, mostly are partial cloud backups, and provisional approaches I took that will be replaced later but wanted to keep a history of them.

Raul Fernandez Hernandez commented

2024-04-05 05:02:41 +02:00

This will be the first of a series of partial PR's to improve the performance of the different algorithms in the expand operator to keep the PR small and manageable.

Also the expand operator will likely require a deeper refactor to be really scalable with the mesh size with a more dynamic or lazy initialization to avoid calculating a field value for every vertex even when unused and adding multithreading to the current implementation will be a step in the right direction of squeezing more performance now.

This will be the first of a series of partial PR's to improve the performance of the different algorithms in the expand operator to keep the PR small and manageable. Also the expand operator will likely require a deeper refactor to be really scalable with the mesh size with a more dynamic or lazy initialization to avoid calculating a field value for every vertex even when unused and adding multithreading to the current implementation will be a step in the right direction of squeezing more performance now.

Raul Fernandez Hernandez added 1 commit 2024-04-05 15:28:29 +02:00

comment 160faa37b8

Raul Fernandez Hernandez added 1 commit 2024-04-08 03:59:43 +02:00

Merge branch 'main' into improve-expand-perf 063711c2f1

Raul Fernandez Hernandez added 1 commit 2024-04-08 17:56:11 +02:00

repleace tbb 9537422dc1

Hans Goudey requested changes 2024-04-10 20:37:00 +02:00

source/blender/editors/sculpt_paint/sculpt_expand.cc Outdated

						
				@ -1497,10 +1480,12 @@ static void sculpt_expand_update_for_vertex(bContext *C, Object *ob, const PBVHV

				          sculpt_expand_mask_update_task(ss, mask_write, expand_cache->nodes[i]);

				        }

				      });

				      SCULPT_flush_update_step(C, SCULPT_UPDATE_MASK);

Hans Goudey commented

2024-04-10 20:28:36 +02:00

Unrelated change

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_expand.cc Outdated

						
				@ -2140,6 +2124,11 @@ static int sculpt_expand_invoke(bContext *C, wmOperator *op, const wmEvent *even

				  if (!BKE_base_is_visible(v3d, base)) {

				    return OPERATOR_CANCELLED;

				  }

				  /* Do nothing when the mesh has 0 vertices. */

Hans Goudey commented

2024-04-10 20:28:25 +02:00

Unrelated change?

👍 1

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_expand.cc Outdated

						
				@ -2222,7 +2204,6 @@ static int sculpt_expand_invoke(bContext *C, wmOperator *op, const wmEvent *even

				      ss->expand_cache, ob, ss->expand_cache->initial_active_vertex, falloff_type);

				  sculpt_expand_check_topology_islands(ob, falloff_type);

Hans Goudey commented

2024-04-10 20:28:18 +02:00

Unrelated change

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -8,6 +8,7 @@

				#include <cmath>

				#include <cstdlib>

				#include <thread>

Hans Goudey commented

2024-04-10 20:28:54 +02:00

Unnecessary include?

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -77,6 +78,7 @@ static bool sculpt_geodesic_mesh_test_dist_add(Span<float3> vert_positions,

				  return false;

				}

				#define UNASSIGNED -1

Hans Goudey commented

2024-04-10 20:29:13 +02:00

Typically it's preferred to use const int declared inside the function or so

Typically it's preferred to use `const int` declared inside the function or so

Iliya Katushenock commented

2024-04-22 09:19:49 +02:00

Sorry, but now this line is other one issue, now this is UB.

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -146,0 +137,4 @@

				      threading::parallel_for(IndexRange(0, totvert), 4096, [&](IndexRange range) {

				        for (const int i : range) {

				          if (len_squared_v3v3(v_co, vert_positions[i]) <= limit_radius_sq) {

				            affected_vert[i].set();

Hans Goudey commented

2024-04-10 20:31:00 +02:00

Manipulation of a bitmap is not threadsafe because it's a read-modify-write operation. It should be converted to a boolean array, or you can use threading::parallel_for_aligned and bits::BitsPerInt to align the task boundaries on integer boundaries.

Manipulation of a bitmap is not threadsafe because it's a read-modify-write operation. It should be converted to a boolean array, or you can use `threading::parallel_for_aligned` and `bits::BitsPerInt` to align the task boundaries on integer boundaries.

👍 1

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -147,0 +144,4 @@

				    }

				  }

				  std::vector<int> queue, queue_next;

Hans Goudey commented

2024-04-10 20:32:49 +02:00

In Blender code Vector is preferred over std::vector. That's documented more in the headers and here: https://developer.blender.org/docs/features/core/blenlib/containers/

In Blender code `Vector` is preferred over `std::vector`. That's documented more in the headers and here: https://developer.blender.org/docs/features/core/blenlib/containers/

👍 1

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -147,0 +148,4 @@

				  queue.reserve(totedge);

				  /* Add edges adjacent to an initial vertex to the queue.

				   * Since initial vertex are few only, iterating over its neighbour edges

Hans Goudey commented

2024-04-10 20:33:02 +02:00

typo/grammar

Suggest "Since there are typically few initial vertices"

typo/grammar Suggest "Since there are typically few initial vertices"

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -147,0 +149,4 @@

				  /* Add edges adjacent to an initial vertex to the queue.

				   * Since initial vertex are few only, iterating over its neighbour edges

				   * instead of over all edges scales better as mesh edge count increases */

Hans Goudey commented

2024-04-10 20:33:43 +02:00

Missing period at the end

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -147,0 +153,4 @@

				  GSetIterator gs_iter;

				  GSET_ITER (gs_iter, initial_verts) {

				    const int seed_vert = POINTER_AS_INT(BLI_gsetIterator_getKey(&gs_iter));

				    for (const int e : ss->vert_to_edge_map[seed_vert]) {

Hans Goudey commented

2024-04-10 20:32:00 +02:00

Typically in mesh code I'd prefer to see longer variable names like edge or edge_index

Typically in mesh code I'd prefer to see longer variable names like `edge` or `edge_index`

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -180,3 +191,3 @@

				        }

				        for (const int v_other : corner_verts.slice(faces[face])) {

				          if (ELEM(v_other, v1, v2)) {

				          if (ELEM(v_other, v1, v2) || !sculpt_geodesic_mesh_test_dist_add(

Hans Goudey commented

2024-04-10 20:34:48 +02:00

Keep these as separate if statements, it's more readable that way. It's really much simpler to review if the changes are as small as necessary to achieve the speedup

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -202,2 +199,2 @@

				                  BLI_LINKSTACK_PUSH(queue_next, POINTER_FROM_INT(e_other));

				                }

				          for (const int e_other : ss->vert_to_edge_map[v_other]) {

				            const int ev_other = (edges[e_other][0] == v_other) ? edges[e_other][1] :

Hans Goudey commented

2024-04-10 20:35:22 +02:00

Use bke::mesh::edge_other_vert

Use `bke::mesh::edge_other_vert`

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -204,0 +213,4 @@

				             * collission in hash tables can also be used to avoid

				             * parallel data storage collisions */

				            size_t idx = e_other % new_size;

				            while (queue_next[idx] != UNASSIGNED) {

Hans Goudey commented

2024-04-10 20:36:15 +02:00

I'm skeptical that this will give deterministic behavior. That's typically required here, otherwise things get very hard to debug.

Raul Fernandez Hernandez commented

2024-04-11 17:28:04 +02:00

For the initial falloff field creation (one falloff value per vertex) the order in which is calculated for the vertex does not matter, at the end of the algorithm all the vertices will have its falloff calculated. If is vertex[0], vertex[1]...vertex[n] or vertex[2], vertex[0]....vertex[1], vertex[n] is irrelevant here. Is deterministic in the sense that all the result are the same regardless the order of the calculation. Is very well worth the speed gain for the user. This code will be infinitely more executed than it will be debugged.

Iliya Katushenock commented

2024-04-11 17:32:29 +02:00

If geometry nodes will let to view randomly selected vertices (by using index of vertex), shouldn't this cause vertex flickering?

Hans Goudey commented

2024-04-11 17:36:14 +02:00

Just going to mark this as "not resolved" since there was no change here. Better that the conversation is visible for someone reading the diff.

Raul Fernandez Hernandez commented

2024-04-11 22:02:40 +02:00

If geometry nodes will let to view randomly selected vertices (by using index of vertex), shouldn't this cause vertex flickering?

I don't think so, this is an initialization step that happens only when the falloff field is created and is used only in the expand operator.

> If geometry nodes will let to view randomly selected vertices (by using index of vertex), shouldn't this cause vertex flickering? I don't think so, this is an initialization step that happens only when the falloff field is created and is used only in the expand operator.

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -303,3 +321,3 @@

				}

				}  // namespace blender::ed::sculpt_paint::geodesic

				} /* namespace blender::ed::sculpt_paint::geodesic */

Hans Goudey commented

2024-04-10 20:27:49 +02:00

Unrelated change

farsthary marked this conversation as resolved

tests/data Outdated

						
				@ -1 +1 @@

				Subproject commit 612e310bf7ee505a79dad7ea925916ec9a812995

				Subproject commit ce850ef056c57d0e4127eda48a514b2c4a651435

Hans Goudey commented

2024-04-10 20:27:55 +02:00

Unrelated change

Raul Fernandez Hernandez commented

2024-04-11 18:53:38 +02:00

I don't know how that file got changed, maybe a merge from main?

Hans Goudey commented

2024-04-11 19:07:48 +02:00

Either way, it shouldn't be part of this PR.

Raul Fernandez Hernandez commented

2024-04-15 10:31:44 +02:00

I invoke the Git Gurus to help me get rid of this XD

Raul Fernandez Hernandez commented

2024-04-22 13:17:21 +02:00

fixed

farsthary marked this conversation as resolved

Hans Goudey changed title from ~~Fix #102586 Sculpt Mode: Improve Expand performance~~ to Sculpt: Improve Expand performance

2024-04-10 20:37:28 +02:00

Raul Fernandez Hernandez added 69 commits 2024-04-11 22:00:29 +02:00

cppcheck: suppress noisy output 3b0521976e

Some of these could be enabled if there is an interest to consider
them warnings, until then they're noisy and don't often hint at errors
in the code.

Fix assert in the case of a null library_reference items ac88d0e5e7

Missing return allowed a null return (which asserts) & ignored the
r_free assignment.

GHOST: add missing check for end full-screen failing 5ac7eb8c01

Cleanup: use const EnumPropertyItem pointers 8cce0b5ef0

Cleanup: pass const arrays for default boolean arguments bc2ada4746

Cleanup: remove redundant checks & struct declaration 45a11276f8

Cleanup: pass const arguments in freestyle functions f9cdf830ea

Also move implementation notes into the funciton body.

Cleanup: spelling in comments 5ce74df0ba

Fix #120367 : Attach to root panel if new parent is null aeb6b3e136

If pointer to new parent is null, use pointer to root panel of interface.

Pull Request: #120369

Anim: add Animation data-block management functions 5ef9d0f51f

Add code (including RNA wrappers) for:

- Creating, removing, and accessing `Animation` data-blocks.
- Creating and removing layers, strips, and bindings on those `Animation`
  data-blocks.
- Accessing those via RNA.

Note that this does not include assignment to any animated data-block,
so it is of limited practical use.

For more info, see #113594.

Pull Request: #118677

Anim: allow assigning Animation data-blocks 65408d5c56

Expand the `AnimData` struct with an `Animation *` + an
`binding_stable_index` field, and properly handle those relations.

This also adds functionality for actually pointing animated IDs to
`Animation` data-blocks, and automatically hooking up the relevant
`Binding`.

The Depsgraph code is extended to take these new relations into account,
but doesn't trigger any animation evaluation yet.

For more info, see #113594.

Pull Request: #118677

Anim: Add a simple GUI for assigning Animation data-blocks e4b5cc9fb5

Add a 'Baklava' panel to the 3D Viewport side-panel. It's a
developer-GUI, not meant for animators (or for inclusion beyond the
experimental feature, for that matter).

Note that this GUI shows all layer properties, even though the data
model is currently limited to a single layer. This means that things
like 'influence' and 'mix mode' are irrelevant, as there is no
underlying layer to mix with.

Also key insertion and animation evaluation are not implemented yet (but
will be in upcoming commits).

For more info, see #113594.

Pull Request: #118677

Anim: allow inserting keys in Animation data-block 4709b491ce

Allow inserting keys into Keyframe strips (which is the only type of
strip that is currently implemented).

Note that the data model is currently limited to a single layer, with a
single infinite strip. Because of this, the strip will not be shown in
any UI, as there is no way to manipulate it anyway.

Note that the inserted keys are not yet evaluated, so the animation
isn't visible in the 3D viewport yet. That's for an upcoming commit.

For more info, see #113594.

Pull Request: #118677

Anim: add evaluation of Animation data-blocks 2c9b0be956

Include Animation data-block handling in Blender's animation evaluation
stack. If an `Animation` is assigned to an `ID`, it will take precedence
over the NLA and/or any `Action` that might be assigned as well.

For more info, see #113594.

Pull Request: #118677

Anim: show animated properties with the appropriate color in the GUI 250a92b5d8

Use the `Animation` data-block to find F-Curves, for drawing the
background color of properties in the GUI (green for 'animated', yellow
for 'keyed on this frame', etc.).

This assumes (correctly) that the `Animation` is limited to a single
layer with a single strip. As such, there is only one set of F-Curves
for every animated ID, which means that the correct F-Curve will be
found. This will need adjustment when the same property can have
multiple F-Curves (due to layers) or when an F-Curve may not be
applicable for each point in time (due to the use of finite strips).

Pull Request: #118677

Cleanup: move runtime data to Library.runtime 5207259b96

Having a better separation between runtime and non-runtime data makes
it easier to reason about the code.

Pull Request: #120271

Fix #120374 : Knife crashes when displaying measurements bfdbede6a7

Caused by 7ed02da2be.
`curr.ob_index` is -1 when the ray does not hit any face
(`knife_find_closest_face`). This will lead to crash when mouse
is not over any faces. To avoid crash, use `bvh.ob_index`.

Pull Request: #120392

Cleanup: quiet missing-declarations warnings f303ed401f

Fix #120071 : quadratic complexity for dangling reroute check 963057f119

To know if link is connected to dangling reroute and can be skipped
as value-less, we need to know if reroute is dangling. This requires
graph traversal. Currently this is done by non-recursive iteration.
But this can lead quadratic complexity for some of the cases.
Other way is to make this linear while cache building.

Pull Request: #120375

Compositor: Allow inter-operation canceling deba511d3e

This patch adds support for inter-operation canceling. Though it should
be noted that canceling will actually not take place except in certain
circumstances because operations are already submitted to the GPU by
this point and can't be canceled.

However, for operations that do CPU<->GPU transfers, like OIDN denoise,
which is one of the most expensive operations, this will actually cancel
the evaluation and greatly improve interactivity.

Pull Request: #119917

Anim: change experimental flag label & linked issue on Gitea e91a110a5e

Changed the experimental flag label from "Animation: Project Baklava" to
"New Animation System" to be more self-explanatory. Also I changed the
associated item on project.blender.org to #120406, which tracks the work
that is still to be done.

Anim: change label & description of experimental flag again c06f269425

Change the experimental flag "New Animation System" to "New Animation
Data-block", as that's really what it is about. The tooltip is now also
more precise about what this experimental stage is about.

Cycles: Adjust Meteor Lake GPUs architecture version 8649603b31

Released devices are 12.70.4 or higher, there is no need to build for
12.70.0.

EEVEE-Next: Fix performance regression 804a04e8fd

The SAFE_UNION_ACCESS is always defined.
The correct check is to test for its value.

Fix DATA_MEMBER declaration too.

Collection Exporters: Enable file exporters to be specified on Collections 2774f9c650

This implements the ability to have file exporters added and configured on Collections.

Exporting is reachable from several locations:
- Individually on each exporter configuration: The `Export` button in each panel header
- For all exporters on the collection: The `Export All` button in the main panel interface
- For all exporters on all collections in the scene: The `File`->`Export All Collections` button

Visibility of which collections currently have exporters configured is done by ways of an icon added to the Collection row in the Outliner.

Adding multiple exporters for the same file type is permitted. The user is free to setup several exports of the same format but with different file locations or settings etc.

Notes:
Only USD and Wavefront OBJ are enabled for the initial commit. Additional formats, including those implemented in Python will be added as separate commits after this.

Ref #115690
Pull Request: #116646

Cleanup: quiet missing-declarations warnings 76a9db475d

Cleanup: remove redundant variables & assignments 3151b83bbb

Co-authored-by: Sean Kim <SeanCTKim@protonmail.com>

Cleanup: promote to size_t before calculating sizes 8338f9e0ce

Avoids integer truncation although it's mostly theoretical in this case.

Cleanup: avoid shadowing variables 072a56f356

Fix potential null pointer dereference from the RenderView::imbuf dc775e59df

This member is checked for null elsewhere, account for a null imbuf
when accessing pixels too.

Anim: Detailed report if no keyframes have been inserted 3869a8c852

With this PR, when pressing `I` in the viewport and the code
is unable to insert **ANY** keyframes, the user will be presented
with a single message detailing exactly why it has failed.

This PR promotes the functionality introduced in
#117449 into the header file so it can be used elsewhere.

The `CombinedKeyingResult` class is returned
from `insert_key_action` and `insert_key_rna`, and
used to produce a single report from the operator if it
failed to insert any keyframes.

In order to easily create a report from a `CombinedKeyingResult`
the function `generate_keyframe_reports_from_result`
has been moved into the class as `generate_reports`.

In addition to that the `UNABLE_TO_INSERT_TO_NLA_STACK` result
has been added. This notifies the user if keyframe insertion is not
possible due to NLA stack settings.

Pull Request: #119201

Anim: thread remake_graph_transdata 3a9b7d076a

On animations with high key counts, `remake_graph_transdata`
takes most of the compute time when moving keys.
This patch threads the loop over FCurves in that function to speed things up.

Test file with 10.000 keys per F-Curve
| - | Before | After |
| - | - | - |
| Moving 1 key of each FCurve |  ~2200ms | ~285ms |
| Moving a single key | ~0.70ms | ~0.72ms |

As demonstrated in the measurements, this speeds up the
case of modifying a lot of data, while not impacting the case
of modifying very little data.
The measurements were taken on an 8c/16t CPU.
The higher the thread count, the better the performance gain.

Measurements of `remake_graph_transdata` using the following test file.
https://download.blender.org/ftp/sybren/animation-rigging/heavy_mocap_test.blend

Pull Request: #119497

BLI: IndexMask: Add binary set operations 81e2b7d7d5

The `IndexMask` class already had a static function `from_union`.
This adds two new functions `from_difference` and `from_intersection`
as well as tests for each of them.

It also uses `from_intersection` in two grease pencil utility functions.

Pull Request: #120419

Fix #120144 : Invert Visible doesn't invert Face Sets properly afa8fbfd8e

Caused by 53418fc3a1.
corner vertices of hidden face sets are not `1/true/hidden` (which
is expected). So inverting then hides these corner verts, hence the
small region of face sets is visible afterwards.
This could be avoided if `.hide_poly` attribute is used for changing the
visibility.

Pull Request: #120160

Fix #119702 : Set Position node behaves incorrectly on instances with selection ad66f1caa0

The selection was not taken into account properly and uninitialized data was used.

Pull Request: #120409

Realtime Compositor: Support variable scaling 85a04ad05a

This patch adds support for variable scaling in the Scale node for the
Realtime Compositor. This is supported for the CPU compositor.

Pull Request: #120314

Cycles: Properly default to Metal-RT off unless GPU is a M3 or newer 9dab4f319c

Ever since commit [1], `use_metalrt_by_default` will be True
if the GPU being used is not a M1 or M2 based system.
The intention of this was to enable MetalRT by default for
M3 and newer devices that have hardware for ray traversal.

However the side effect of this change was that all AMD GPUs would
have `use_metalrt_by_default` set to True. Which appears to be the
main culprit causing crashes on older AMD GPUs in #120126.
Since these GPUs don't support MetalRT.

This commit fixes this issue by only setting
`use_metalrt_by_default` to True if the GPU is not M1 or M2 based,
and the GPU is Apple Silicon based. Which equates to M3 or newer.
Which is the original intent of this code.

This resolves the issue where AMD GPUs were being told to use MetalRT
by default, when they shouldn't be.

[1] 322a2f7b12

Pull Request: #120299

Anim: Avoid loading alpha channel from theme for keyframe drawing ebb4ac7494

The theme has a color for each keyframe type. This is exposed in RNA and
the preferences UI as an RGB color, so without alpha channel. The code
that loads the color from the theme, however, does include that
unsettable alpha channel, though.

In practice this doesn't cause issues, as the versioning code ensures
that the default colors are stored with `alpha=uint8_t(255)`. While
developing, however, this caused me considerable headscratching, as I
was missing that little bit of versioning code, which means that the
default color was set to all-zeroes, and thus had a zero alpha.

This is now resolved by altering the drawing code to only load the RGB
component from the theme. This way only the user-settable bytes are
loaded, while the alpha component is handled a different way (setting it
to an absolute value, instead of multiplying the theme value).

No user-perceivable changes, just some developer comfort.

Pull Request: #120311

Fix #118509 : GPv3: frames disappear during transform in the timeline bbb6b0e50f

While selecting and transforming multiple grease pencil frames in the
timeline, frames would sometimes disappear. This happened when the
transformation overlapped, meaning when a frame replaced another moving
one in the timeline. The frames transformation was happening in place
and in series, and thus leaded to the initial position of the frame to
be removed, even if it was occupied by a freshly transformed framed.

This commit fixes the issue by storing two separate maps in the
transform data structure instead of one, one map to store the static
data, meaning the frames that are not affected by the transform, and
another one for the moving data, meaning the frames that are affected by
the transform.

Some changes were made to account for the potential duplication of the
frames (move + duplicate):
* the map of the duplicated frames was renamed to a more explicit name,
* when a duplication occurs, the original frame is stored in the static
  frames map, and the duplicate is stored in the moving frames map,
* thus the check for the duplicate flag of the operator is now done at
  the init step of the transform, instead of the update step, so that
  the maps are initialized accordingly.

Co-authored-by: Lukas Tönne <lukas@blender.org>
Pull Request: #119337

Anim: make CombinedKeyingResult plain old data db3ccf0bc3

The data in CombinedKeyingResult was using BLI's Array type, which
heap allocates.  However, we know the array size at compile time,
and therefore can use std::array instead, as suggested in Array's
documentation.  This makes CombinedKeyingResult a
plain-old-data class.

Pull Request: #120425

Fix #120436 : "Show Fill Color" setting not working 78abe7d651

The flag was completly unused in the code.
Now this flag does what it says: While drawing,
the fill color is (almost) not visible.

Pull Request: #120438

Fix: EEVEE-Next: Volume: Wrong voxelization from accurate method fbba8de3fd

EEVEE-Next: Add back volume reprojection 49977bee1c

This is done almost the same as legacy EEVEE.

The main difference is that we now take into
account the change in volume Z bounds during
reprojection.

The viewport smoothing experience as also be
improved with less blurring when moving
objects.

We also still do the reprojection during
final rendering between samples. This improves
the convergence as the input to the film TAA
is less noisy.

Pull Request: #120320

UI: Splash Screen Import Changes be8869949a

Removal of "What's New" button, addition of a section divider, and many
very small changes to the text, some reordering of items.

Pull Request: #118224

UI: Removal of GRAPH_OT_keys_to_samples Confirmation b05d1ab52c

Removal of the dialog that pops up to confirm this operator. The
Animation & Rigging module feels this isn't needed since this action
is quite visual and hard to miss. This operator is also not often
used and might get removed in future as unnecessary.

Pull Request: #120074

Fix #120443 : Knife tool loses precision on edges with high clip distance range 3582319c4d

The previous algorithm for interpolating the vertices of an edge using
a lambda in screen space was heavily dependent on the resolution of the
projection matrix, leading to precision loss when the clip distance
range was high.

The updated algorithm now adjusts the lambda according to the
perspective before performing the interpolation, which improves
precision regardless of the clip distance range.

Fix: Matrix socket is always visible 0e1c6c8c57

Fix of typo in weakly typed API.

Pull Request: #120337

Fix: Crash calculating edges of empty mesh 5f47ff5333

At some point (Fill Curve node in my case) it is possible to have
empty task of edges. Simplest fix - ignore such tasks. In future
this might be improved (see: #120224).

Pull Request: #120330

Cleanup: Use utility function to sample triangle bary coord b37bc24285

Also move the declaration of the BVH tree closer to its use.

Fix #120400 : Remove cyclic dependency of pbvh type when building bmesh pbvh 9e30209725

This PR fixes a crash when enabling Dyntopo.

When the PBVH generation code for BMesh was changed in b3aca5b28f, a
bug was introduced where dependent code to generate necessary layers
for BMesh could no longer access the PBVH type, as it was not generated
yet. As the type is known in the code at time of construction, I opted
to add parameters specifying the PBVHType to necessary methods when
accessing attributes.

Pull Request: #120445

Cleanup: Make format 660a495fe2

Fix crash using "--command" on Windows ebb2ca8055

Use after free error accessing command line arguments on windows.
Resolve by duplicating the arguments when a "--command" is passed in.

Fix error accessing the add-on install "target" without a context 6e32f8836e

Introspecting the add-on install operator from operator cheat-sheet
raised a Python exception.

Cleanup: spelling in comments f4fd699f1f

Cleanup: remove type hints & f-strings from core scripts 72a4262a92

Follow conventions for core scripts.

Cleanup: avoid redundant string formatting 80a082a3e3

Fix: GPv3: Clear copy strokes buffer if nothing is selected during copy 067c674818

If `clipboard.curves` has some curve and the copy operation is called
again when nothing is selected, the buffer is expected to be cleared.
Otherwise the paste operations will paste the previous copied curves.

Pull Request: #120454

Fix #120421 : Animated "Burn metadata into image" ignored during render e9add526d1

Seems like scene `RenderData` only gets synched once from `Scene` to
`BaseRender` in `RE_InitState` .
Animation on it is only evaluating on the scene, so the the `BaseRender`
`RenderData` is not properly updated here.

However `R_STAMP_DRAW` is part of that and it is the `BaseRender`
`RenderData` that is checked in `do_render_full_pipeline` (not the
`Scene` one) to determine if we want to stamp.

Later calls to `BKE_render_result_stamp_info` / `renderresult_stampinfo`
/ `stampdata` always get passed the scene, so individual animation stamp
details (such as Render Time) work properly.

So to resolve, use the `Scene` `RenderData` (rather than the
`BaseRender` one) for proper animation update.

NOTE: this (not updating animation of members of `RenderData`) might
actually be a problem elsewhere, too -- havent checked on this in detail
though

Pull Request: #120429

GPv3: Deselect everything from all visible drawings when pasting strokes a1e1a6c346

To match with gpv2 behavior, fill selection attribute of all the visible
drawings with "false" value to deselect them so only the pasted stroke
from the clipboard is selected.

Pull Request: #120363

Fix #120459 : GPv3: Issue in conversion for layer parents 8089f37be1

The layer parenting did not account for storing an
initial parent inverse matrix (to "keep transform").

This adds this matrix, stores it in DNA, and uses it
when we compute the parent matrices on demand.

Note: This PR does not set the parent inverse matrix
outside of conversion from GPv2. Support for
"keep transform" parenting will have to be added
in another PR.

Pull Request: #120462

Curves: add operator to set bezier handle types 4bc86cfe83

This adds an operator to set the handle types of bezier curves. It also
adds the same shortcut that is available in the legacy curve edit mode.

Pull Request: #120426

UI: Add Rename operator in bone collection context menu 6b23c321cd

Operator to rename tree view item already exists: `UI_OT_view_item_rename`
Add this in context menu as discussed in !120003

Pull Request: #120086

Cleanup: Make format b84818b018

Fix #120402 : memory leak in volume grid handling 5aa16f5670

The openvdb tree was not freed correctly because the destuctor
of the `shared_ptr` was not run.

Fix #120417 : VSE crash rendering a scene strip with missing Editing data 9890e738a9

In case a scene strips `Input` was set to `Sequencer`, but the scene did
not actually have `Editing` data (e.g. when just adding a `New` scene as
scene strip), blender would crash trying to render.

Now simply check if the required `Editing` data is present before
proceeding.

Pull Request: #120418

BLI: Fix inverted 0 determinant for infinite orthographic projection c1534f6f18

This was creating issues with triangle winding order since
the resulting matrix was degenerate (0 determinant). Which
caused the Draw manager to wrongly invert the winding.

This fixes a bug in EEVEE-Next mesh voxelization for volume
rendering (with accurate method).

Fix #113182 : Crash when animating Viewport visibility of Curves / GPv3 object ff496fcc11

This is actually a deeper issue, which roots to the fact that updating
relations of the dependency graph does not properly handle cases when an
operation was previously skipped from evaluation (i.e. as a visibility
optimization).

The fix is to preserve needs-update flag throughout relations update of
a dependency graph, similar to the entry tags.

Pull Request: #120477

UI: Indicate type of search in keymap search button 8bda7c9036

Shows either "Search by Name" or "Search by Key-Binding" as placeholder text in
the search button. This makes it more clear what's expected to be entered here.

Personally I often did/do the mistake of searching for the wrong thing in this
button, because I forgot that I changed it earlier. The placeholder text can
avoid this mistake.

Pull Request: #113681

feedback a8d16ef143

Raul Fernandez Hernandez added 68 commits 2024-04-12 01:16:14 +02:00

Revert "cppcheck: suppress noisy output" b249eaf152

This reverts commit 3b0521976e.

Revert "Fix assert in the case of a null library_reference items" bc28e737be

This reverts commit ac88d0e5e7.

Revert "GHOST: add missing check for end full-screen failing" 85de11e8b2

This reverts commit 5ac7eb8c01.

Revert "UI: Indicate type of search in keymap search button" 1e9cb4888f

This reverts commit 8bda7c9036.

Revert "Fix #113182 : Crash when animating Viewport visibility of Curves / GPv3 object" 8c29fc0ae5

This reverts commit ff496fcc11.

Revert "BLI: Fix inverted 0 determinant for infinite orthographic projection" c6fd8c0e00

This reverts commit c1534f6f18.

Revert "Fix #120417 : VSE crash rendering a scene strip with missing Editing data" 14d9969835

This reverts commit 9890e738a9.

Revert "Fix #120402 : memory leak in volume grid handling" b5ac6cf670

This reverts commit 5aa16f5670.

Revert "Cleanup: Make format" 8abf77ef70

This reverts commit b84818b018.

Revert "UI: Add Rename operator in bone collection context menu" 13144fec75

This reverts commit 6b23c321cd.

Revert "Curves: add operator to set bezier handle types" 58b784417b

This reverts commit 4bc86cfe83.

Revert "Fix #120459 : GPv3: Issue in conversion for layer parents" 119fb11eea

This reverts commit 8089f37be1.

Revert "GPv3: Deselect everything from all visible drawings when pasting strokes" 6e5d4edb0d

This reverts commit a1e1a6c346.

Revert "Fix #120421 : Animated "Burn metadata into image" ignored during render" 89846991dc

This reverts commit e9add526d1.

Revert "Fix: GPv3: Clear copy strokes buffer if nothing is selected during copy" ac187d00ec

This reverts commit 067c674818.

Revert "Cleanup: avoid redundant string formatting" dcca52b604

This reverts commit 80a082a3e3.

Revert "Cleanup: remove type hints & f-strings from core scripts" 58c2cda42f

This reverts commit 72a4262a92.

Revert "Cleanup: spelling in comments" 64bb745c3d

This reverts commit f4fd699f1f.

Revert "Fix error accessing the add-on install "target" without a context" 43ad1d6af7

This reverts commit 6e32f8836e.

Revert "Fix crash using "--command" on Windows" 16a1ce88ef

This reverts commit ebb2ca8055.

Revert "Cleanup: Make format" f9a1a87f3c

This reverts commit 660a495fe2.

Revert "Fix #120400 : Remove cyclic dependency of pbvh type when building bmesh pbvh" ac723616e3

This reverts commit 9e30209725.

Revert "Cleanup: Use utility function to sample triangle bary coord" 8bece4be5b

This reverts commit b37bc24285.

Revert "Fix: Crash calculating edges of empty mesh" 70b8d3b0d5

This reverts commit 5f47ff5333.

Revert "Fix: Matrix socket is always visible" f0f3068e76

This reverts commit 0e1c6c8c57.

Revert "Fix #120443 : Knife tool loses precision on edges with high clip distance range" e2c63b543b

This reverts commit 3582319c4d.

Revert "UI: Removal of GRAPH_OT_keys_to_samples Confirmation" ad2a6b47a6

This reverts commit b05d1ab52c.

Revert "UI: Splash Screen Import Changes" 13a87835b8

This reverts commit be8869949a.

Revert "EEVEE-Next: Add back volume reprojection" 23bb566749

This reverts commit 49977bee1c.

Revert "Fix: EEVEE-Next: Volume: Wrong voxelization from accurate method" d3a5c6c83c

This reverts commit fbba8de3fd.

Revert "Fix #120436 : "Show Fill Color" setting not working" 859edb1776

This reverts commit 78abe7d651.

Revert "Anim: make CombinedKeyingResult plain old data" 2c3788ff5d

This reverts commit db3ccf0bc3.

Revert "Fix #118509 : GPv3: frames disappear during transform in the timeline" d6cf5ef6df

This reverts commit bbb6b0e50f.

Revert "Anim: Avoid loading alpha channel from theme for keyframe drawing" a8d6a6098e

This reverts commit ebb4ac7494.

Revert "Cycles: Properly default to Metal-RT off unless GPU is a M3 or newer" 2f3c61dd4d

This reverts commit 9dab4f319c.

Revert "Realtime Compositor: Support variable scaling" b3266478d9

This reverts commit 85a04ad05a.

Revert "Fix #119702 : Set Position node behaves incorrectly on instances with selection" ef3b7d206f

This reverts commit ad66f1caa0.

Revert "Fix #120144 : Invert Visible doesn't invert Face Sets properly" d87229d14e

This reverts commit afa8fbfd8e.

Revert "BLI: IndexMask: Add binary set operations" 36e63d81c8

This reverts commit 81e2b7d7d5.

Revert "Anim: thread remake_graph_transdata" 03a0b5c74e

This reverts commit 3a9b7d076a.

Revert "Anim: Detailed report if no keyframes have been inserted" 01670e5681

This reverts commit 3869a8c852.

Revert "Fix potential null pointer dereference from the RenderView::imbuf" 8d9c22226e

This reverts commit dc775e59df.

Revert "Cleanup: avoid shadowing variables" d4d4907f8b

This reverts commit 072a56f356.

Revert "Cleanup: promote to size_t before calculating sizes" 697d9f3534

This reverts commit 8338f9e0ce.

Revert "Cleanup: remove redundant variables & assignments" b3305f4728

This reverts commit 3151b83bbb.

Revert "Cleanup: quiet missing-declarations warnings" 591ad09c7c

This reverts commit 76a9db475d.

Revert "Collection Exporters: Enable file exporters to be specified on Collections" 5e8b532ecf

This reverts commit 2774f9c650.

Revert "EEVEE-Next: Fix performance regression" fbc259c9d0

This reverts commit 804a04e8fd.

Revert "Cycles: Adjust Meteor Lake GPUs architecture version" 6434072258

This reverts commit 8649603b31.

Revert "Anim: change label & description of experimental flag again" e54e07fefc

This reverts commit c06f269425.

Revert "Anim: change experimental flag label & linked issue on Gitea" f4aae007a7

This reverts commit e91a110a5e.

Revert "Compositor: Allow inter-operation canceling" 6651728d9e

This reverts commit deba511d3e.

Revert "Fix #120071 : quadratic complexity for dangling reroute check" 09060a3cb6

This reverts commit 963057f119.

Revert "Cleanup: quiet missing-declarations warnings" 829417b4e0

This reverts commit f303ed401f.

Revert "Fix #120374 : Knife crashes when displaying measurements" bdfcb9a43c

This reverts commit bfdbede6a7.

Revert "Cleanup: move runtime data to Library.runtime" 59aff1b697

This reverts commit 5207259b96.

Revert "Anim: show animated properties with the appropriate color in the GUI" 1c871d4046

This reverts commit 250a92b5d8.

Revert "Anim: add evaluation of Animation data-blocks" f8716d95f1

This reverts commit 2c9b0be956.

Revert "Anim: allow inserting keys in Animation data-block" 56d1b41992

This reverts commit 4709b491ce.

Revert "Anim: Add a simple GUI for assigning Animation data-blocks" 5a5e4a2c99

This reverts commit e4b5cc9fb5.

Revert "Anim: allow assigning Animation data-blocks" 34c0307967

This reverts commit 65408d5c56.

Revert "Anim: add Animation data-block management functions" afe1b1d3b1

This reverts commit 5ef9d0f51f.

Revert "Fix #120367 : Attach to root panel if new parent is null" 6e9cc8bb50

This reverts commit aeb6b3e136.

Revert "Cleanup: spelling in comments" 3b7d7c6581

This reverts commit 5ce74df0ba.

Revert "Cleanup: pass const arguments in freestyle functions" 5b1f4ae917

This reverts commit f9cdf830ea.

Revert "Cleanup: remove redundant checks & struct declaration" efc088aa94

This reverts commit 45a11276f8.

Revert "Cleanup: pass const arrays for default boolean arguments" f27c985acd

This reverts commit bc2ada4746.

Revert "Cleanup: use const EnumPropertyItem pointers" 820a331a66

This reverts commit 8cce0b5ef0.

Raul Fernandez Hernandez added 1 commit 2024-04-12 02:28:13 +02:00

reset tests 16bd4e7eef

Raul Fernandez Hernandez reviewed 2024-04-12 03:09:28 +02:00

tests/data Outdated

						
				@ -1 +1 @@

				Subproject commit 612e310bf7ee505a79dad7ea925916ec9a812995

				Subproject commit 42979e6ec67f657c7efce9cb672fe07b3f8fff9d

Raul Fernandez Hernandez commented

2024-04-12 03:09:28 +02:00

Applied
git reset origin/main tests/data
but still shows a file changed.

Applied git reset origin/main tests/data but still shows a file changed.

farsthary marked this conversation as resolved

Iliya Katushenock reviewed 2024-04-12 11:36:01 +02:00

source/blender/editors/sculpt_paint/sculpt_expand.cc Outdated

						
				@ -610,3 +610,2 @@

				      const float *co = SCULPT_vertex_co_get(ss, symm_vertex);

				      for (int i = 0; i < totvert; i++) {

				        PBVHVertRef vertex = BKE_pbvh_index_to_vertex(ss->pbvh, i);

				      threading::parallel_for(IndexRange(0, totvert), 4096, [&](IndexRange range) {

Iliya Katushenock commented

2024-04-12 11:23:06 +02:00

IndexRange(0, totvert) -> IndexRange(totvert)

`IndexRange(0, totvert)` -> `IndexRange(totvert)`

👍 1

Sergey Sharybin commented

2024-04-16 10:30:04 +02:00

I think it will be good idea to split this off into a separate PR, as it will make it easier to present and review separate modes of the Expand operator.

For example, if this code is split off into a separate PR it gives an opportunity:

Present an exact speedup on a complex mesh for the spherical fall-off expansion
The code is much more isolated, and can be landing to the main branch mush sooner, without being blocked by a more tricky parts the PR.

I think it will be good idea to split this off into a separate PR, as it will make it easier to present and review separate modes of the Expand operator. For example, if this code is split off into a separate PR it gives an opportunity: - Present an exact speedup on a complex mesh for the spherical fall-off expansion - The code is much more isolated, and can be landing to the main branch mush sooner, without being blocked by a more tricky parts the PR.

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -79,14 +79,13 @@ static bool sculpt_geodesic_mesh_test_dist_add(Span<float3> vert_positions,

				static float *geodesic_mesh_create(Object *ob, GSet *initial_verts, const float limit_radius)

				{

				  const int UNASSIGNED = -1;

Iliya Katushenock commented

2024-04-12 11:23:40 +02:00

const int UNASSIGNED = -1; -> constexpr const int non_checked = -1;

`const int UNASSIGNED = -1;` -> `constexpr const int non_checked = -1;`

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -120,0 +111,4 @@

				        dists[i] = 0.0f;

				      }

				      else {

				        dists[i] = FLT_MAX;

Iliya Katushenock commented

2024-04-12 11:24:24 +02:00

FLT_MAX -> std::numeric_limits<float>::max()

`FLT_MAX` -> `std::numeric_limits<float>::max()`

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -147,0 +143,4 @@

				    }

				  }

				  Vector<int> queue, queue_next;

Iliya Katushenock commented

2024-04-12 11:26:44 +02:00

Vector<int> queue, queue_next; -> Vector<int> queue;

`Vector<int> queue, queue_next;` -> `Vector<int> queue;`

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -161,1 +167,3 @@

				  }

				  /* queue and queue_next should not differ by a huge amount of size

				   * since they are an advancing front in the mesh hence no need

				   * to allocate much more memmory and keep the iteration range smaller */

Iliya Katushenock commented

2024-04-12 11:29:49 +02:00

Vector<int> queue_next;

`Vector<int> queue_next;`

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -167,1 +173,3 @@

				      int v2 = edges[e][1];

				  BitVector<> edge_tag(totedge);

				  while (!queue.is_empty()) {

				    threading::parallel_for_each(IndexRange(0, queue.size()), [&](const int val) {

Iliya Katushenock commented

2024-04-12 11:30:52 +02:00

const int val -> const int index_of_...~~val~~(value ?)

Have you sure this loop should be parallel_for_each, not parallel_for?

`const int val` -> `const int index_of_...`~~`val`~~(`value` ?) Have you sure this loop should be `parallel_for_each`, not `parallel_for`?

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -168,0 +174,4 @@

				  while (!queue.is_empty()) {

				    threading::parallel_for_each(IndexRange(0, queue.size()), [&](const int val) {

				      const int edge_index = queue[val];

				      int v1 = edges[edge_index][0];

Iliya Katushenock commented

2024-04-12 11:33:35 +02:00

int v1 = edges[edge_index][0]; -> int edge_vert_a = edges[edge_index][0];

`int v1 = edges[edge_index][0];` -> `int edge_vert_a = edges[edge_index][0];`

Raul Fernandez Hernandez commented

2024-04-12 17:58:34 +02:00

this will add unnecessary verbosity in a context where is clear what v1 and v2 means

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -204,0 +216,4 @@

				            /* Open addressing with linear probing that minimizes

				             * collission in hash tables can also be used to avoid

				             * parallel data storage collisions */

				            size_t idx = edge_other % new_size;

Iliya Katushenock commented

2024-04-12 11:35:36 +02:00

Please, just use blender::Set instead this a lot of C-like (usually unreadable) math.
Or, at least, move this is separate function. It is hard to deep in the logic of hash table as part of checking geometrical algorithm. Also still size_t -> const int64_t (or just int..).
You can check this as example: #120224

Please, just use `blender::Set` instead this a lot of C-like (usually unreadable) math. Or, at least, move this is separate function. It is hard to deep in the logic of hash table as part of checking geometrical algorithm. Also still `size_t` -> `const int64_t` (or just int..). You can check this as example: https://projects.blender.org/blender/blender/pulls/120224

👍 1

Raul Fernandez Hernandez commented

2024-04-12 16:40:12 +02:00

This will negate the performance gain, Blender set is really slow and this technique works really well, I've spent several days testing and refining it.

Iliya Katushenock commented

2024-04-12 16:45:19 +02:00

@JacquesLucke /

Raul Fernandez Hernandez commented

2024-04-12 17:46:23 +02:00

For this technique I use a dual advancing front which a Set/Hash table does not quite map correctly. Is the core of this PR

🚀 1

Jacques Lucke commented

2024-04-12 17:52:23 +02:00

I haven't analysed this case in detail yet. Multi-threaded write-access to a hash table like done here feels not safe at a first glance. But will need to check that again.

I don't know where the speedup here comes from specifically. If it is the linear probing, then that can also be done with Set. The probing strategy is customizable as template parameter. One can just pass in LinearProbingStrategy instead of PythonProbingStrategy. I also recommend to read the descriptions of the probing strategies.

There might be other reasons for why a custom hash table is better here, but those need to be justified a bit more.

I haven't analysed this case in detail yet. Multi-threaded write-access to a hash table like done here feels not safe at a first glance. But will need to check that again. I don't know where the speedup here comes from specifically. If it is the linear probing, then that can also be done with `Set`. The probing strategy is customizable as template parameter. One can just pass in `LinearProbingStrategy` instead of `PythonProbingStrategy`. I also recommend to read the descriptions of the probing strategies. There might be other reasons for why a custom hash table is better here, but those need to be justified a bit more.

👍 1

Jacques Lucke commented

2024-04-12 18:02:13 +02:00

Honestly, I was just reading the patch description now. Will have a closer look at how your implementation works soon.

👍 1

Raul Fernandez Hernandez commented

2024-04-12 18:26:32 +02:00

The main reason on using a preallocated vector/array is to allow independent writes to each bucket in a performant way. Do blender set allows preallocation and independent writes?

Raul Fernandez Hernandez commented

2024-04-12 22:23:09 +02:00

Just tried using VectorSet and Set for this and it didn't worked. It crashes Blender.

Just tried using VectorSet<int> and Set<int> for this and it didn't worked. It crashes Blender.

Hans Goudey commented

2024-04-12 22:35:21 +02:00

You cannot write to these data structures from multiple threads.

👍 1

Raul Fernandez Hernandez commented

2024-04-12 22:52:29 +02:00

hence are not a suitable recommendation. I've refactored that part in a separate function so everyone should be happy now ;)

Iliya Katushenock commented

2024-04-12 22:56:51 +02:00

This is still not the best if blender have 2 separate hash table thought. Glad to know if this is okay\

Raul Fernandez Hernandez commented

2024-04-13 00:59:54 +02:00

If we have in Blender a lock free and wait free performant Hash table I could substitute for it.
I suggest reviewers to actually compile and test the branch to clear any doubt of its correctness.
Then in another PR I can use it to implement a lock free and wait free Vector or Hash

If we have in Blender a lock free and wait free performant Hash table I could substitute for it. I suggest reviewers to actually compile and test the branch to clear any doubt of its correctness. Then in another PR I can use it to implement a lock free and wait free Vector or Hash

Raul Fernandez Hernandez added 1 commit 2024-04-12 17:01:11 +02:00

nit 1243ff743a

Raul Fernandez Hernandez added 1 commit 2024-04-12 19:56:55 +02:00

nit pick 1da76ddb7b

Raul Fernandez Hernandez requested review from Jacques Lucke 2024-04-12 21:05:56 +02:00

Raul Fernandez Hernandez requested review from Iliya Katushenock 2024-04-12 21:06:45 +02:00

Raul Fernandez Hernandez added 1 commit 2024-04-12 22:00:52 +02:00

nit pick 952c832894

Raul Fernandez Hernandez added 1 commit 2024-04-12 22:42:58 +02:00

feedback b9f6ccf007

Iliya Katushenock reviewed 2024-04-12 22:50:38 +02:00

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -80,0 +80,4 @@

				/* Open addressing with linear probing that minimizes

				 * collission in hash tables can also be used to avoid

				 * parallel data storage collisions */

				static void lock_free_emplace(const int edge_other, const int non_checked, Vector<int> &queue_next)

Iliya Katushenock commented

2024-04-12 22:47:36 +02:00

Thanks for separating this in to small function.
If you don't want to push/drop elements, Vector<int> &queue_next -> MutableSpan<int> queue_next

Thanks for separating this in to small function. If you don't want to push/drop elements, `Vector<int> &queue_next` -> `MutableSpan<int> queue_next`

👍 1

Raul Fernandez Hernandez commented

2024-04-13 02:07:07 +02:00

Nice tip thanks!

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -147,0 +160,4 @@

				  }

				  Vector<int> queue;

				  Vector<int> queue_next;

Iliya Katushenock commented

2024-04-12 22:49:20 +02:00

If next 10 lines of code is not related with queue_next it's probably better to just move this bottom.

If next 10 lines of code is not related with `queue_next` it's probably better to just move this bottom.

👍 1

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -209,0 +245,4 @@

				    });

				    queue.clear();

				    for (int i = 0; i < new_size; ++i) {

Iliya Katushenock commented

2024-04-12 22:50:07 +02:00

const int edge_i : IndexRange(new_size)

`const int edge_i : IndexRange(new_size)`

farsthary marked this conversation as resolved

Raul Fernandez Hernandez added 1 commit 2024-04-13 02:09:43 +02:00

feedback c37451d47b

Iliya Katushenock requested changes 2024-04-13 12:16:00 +02:00

Iliya Katushenock left a comment

Can you add table of benchmarks of different parts of code to description of PR?
If simple, I don't like idea of using threads in case where you can use log(N) search. I talk about N^2 search of nearest.
Or at least, i'd like to see this as separate change. With separate benchmarks of different sizes of data.

Next one: lock free. Here is no lock free. Currently this PR contains only UB and data races rather than lock free.
I see how to fix hashmap case, but i am not sure about sculpt_geodesic_mesh_test_dist_add.
Please, check std::atomic<int> for hashmap and proof why sculpt_geodesic_mesh_test_dist_add is acceptable to be in parallel loop.
See as additional resources: https://stackoverflow.com/questions/39393850/is-incrementing-an-int-effectively-atomic-in-specific-cases/39396999#39396999
And: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

Can you add table of benchmarks of different parts of code to description of PR? If simple, I don't like idea of using threads in case where you can use log(N) search. I talk about N^2 search of nearest. Or at least, i'd like to see this as separate change. With separate benchmarks of different sizes of data. Next one: lock free. Here is no lock free. Currently this PR contains only UB and data races rather than lock free. I see how to fix hashmap case, but i am not sure about `sculpt_geodesic_mesh_test_dist_add`. Please, check `std::atomic<int>` for hashmap and proof why `sculpt_geodesic_mesh_test_dist_add` is acceptable to be in parallel loop. See as additional resources: https://stackoverflow.com/questions/39393850/is-incrementing-an-int-effectively-atomic-in-specific-cases/39396999#39396999 And: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

👍 1

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -80,0 +92,4 @@

				      idx = 0;

				    }

				  }

				  queue_next[idx] = edge_other;

Iliya Katushenock commented

2024-04-13 12:08:30 +02:00

This is data race. Any other thread can change queue_next[idx] after queue_next[idx] != non_checked.

This is data race. Any other thread can change `queue_next[idx]` after `queue_next[idx] != non_checked`.

Raul Fernandez Hernandez commented

2024-04-13 16:53:08 +02:00

That's an assumption that there's a data race, in the actual code running even with heavy meshes if there's data race it will be seen as unmasked vertices surrounded by masked vertices. I noticed it with previous algorithms that I could for sure trigger a data race.
I couldn't find any error in the masking currently.
Again, I suggest reviewers to apply this patch to Blender and test it.

That's an assumption that there's a data race, in the actual code running even with heavy meshes if there's data race it will be seen as unmasked vertices surrounded by masked vertices. I noticed it with previous algorithms that I could for sure trigger a data race. I couldn't find any error in the masking currently. Again, I suggest reviewers to apply this patch to Blender and test it.

Sergey Sharybin commented

2024-04-16 10:36:01 +02:00

If you add an assert(queue_next[idx] == non_checked) and run in debug mode you'll hit it. The reason it might not show up in the final mask is probably because vertices/edges are reachable from separate "paths", so chance of "missing" scheduled edge is lower because of that, but is not zero.

If you add an `assert(queue_next[idx] == non_checked)` and run in debug mode you'll hit it. The reason it might not show up in the final mask is probably because vertices/edges are reachable from separate "paths", so chance of "missing" scheduled edge is lower because of that, but is not zero.

👍 1

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -172,0 +204,4 @@

				          if (dists[edge_vert_a] > dists[edge_vert_b]) {

				            std::swap(edge_vert_a, edge_vert_b);

				          }

				          sculpt_geodesic_mesh_test_dist_add(vert_positions,

Iliya Katushenock commented

2024-04-13 12:07:47 +02:00

This is data race if you call this function in parallel loop. Do sculpt_geodesic_mesh_test_dist_add perform read/modify operations on dists in random access maner.

This is data race if you call this function in parallel loop. Do `sculpt_geodesic_mesh_test_dist_add` perform read/modify operations on `dists` in random access maner.

👍 1 👀 1

farsthary marked this conversation as resolved

Raul Fernandez Hernandez commented

2024-04-13 16:35:36 +02:00

Ok lets drop this PR and leave the old slow code then since is what people want.

Iliya Katushenock commented

2024-04-13 16:37:40 +02:00

I don't want to say i don't want to improve the speed, but could you fix issues which i point?

Raul Fernandez Hernandez commented

2024-04-13 17:02:41 +02:00

Can you add table of benchmarks of different parts of code to description of PR?
If simple, I don't like idea of using threads in case where you can use log(N) search. I talk about N^2 search of nearest.
Or at least, i'd like to see this as separate change. With separate benchmarks of different sizes of data.

Next one: lock free. Here is no lock free. Currently this PR contains only UB and data races rather than lock free.
I see how to fix hashmap case, but i am not sure about sculpt_geodesic_mesh_test_dist_add.
Please, check std::atomic<int> for hashmap and proof why sculpt_geodesic_mesh_test_dist_add is acceptable to be in parallel loop.
See as additional resources: https://stackoverflow.com/questions/39393850/is-incrementing-an-int-effectively-atomic-in-specific-cases/39396999#39396999
And: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

Thanks for the links! I did tried atomics and guarded threading approach (is in my commit history) but the overhead make it not worth the change, it will eat the performance away to the point of being almost as slow as before.

> Can you add table of benchmarks of different parts of code to description of PR? > If simple, I don't like idea of using threads in case where you can use log(N) search. I talk about N^2 search of nearest. > Or at least, i'd like to see this as separate change. With separate benchmarks of different sizes of data. > > Next one: lock free. Here is no lock free. Currently this PR contains only UB and data races rather than lock free. > I see how to fix hashmap case, but i am not sure about `sculpt_geodesic_mesh_test_dist_add`. > Please, check `std::atomic<int>` for hashmap and proof why `sculpt_geodesic_mesh_test_dist_add` is acceptable to be in parallel loop. > See as additional resources: https://stackoverflow.com/questions/39393850/is-incrementing-an-int-effectively-atomic-in-specific-cases/39396999#39396999 > And: https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange Thanks for the links! I did tried atomics and guarded threading approach (is in my commit history) but the overhead make it not worth the change, it will eat the performance away to the point of being almost as slow as before.

Raul Fernandez Hernandez commented

2024-04-13 17:22:15 +02:00

This was intended to be a stop-gap change to gain some performance while a deeper refactor that eliminates the need to pre-calculate a falloff field for the whole mesh lands, but now I'm even more concerned if refactors are even possible anymore by normal contributors.

Raul Fernandez Hernandez closed this pull request

2024-04-13 17:22:19 +02:00

Raul Fernandez Hernandez commented

2024-04-13 17:24:29 +02:00

Thanks everyone for their reviews and opinions.

Iliya Katushenock commented

2024-04-13 23:55:09 +02:00

First of all: Data race is the bug that no one wouldn't love to fix. And more important is how data race is going to be confirmed by bug triggers.
Do you planned to be cause of some bug report which is cannot be confirmed intentionally in the most of the cases. And each month some new user will go in this report to leave their mark and say blender is bugged..
If error probability is low this error still can be. And if there is no rule which will define always correct result, error might occur.

I glad that you can't find the issue. Good way to force me to spend day to actually reproduce the bug (data race! for last one i have is was cow disease, and i was spend 4 hour to simplify bug file! not to triger, but to just simplify it! hm.. okay actually last one was in disjoin... still very corner case it was).

But.. I did that.
Here is the diff of the 2 calls of geodesic_mesh_create:

As we already can see, this is not deterministic algorithm.
And the fact of that i can move the mouse and selection will be different actually point on the fact that there is gradient of the difference.
And as you can see, texture of the difference is looks like attractor. Than this is distant then its larger i think.
You can see the diff in attached file sculpt_topo_diff_of_diff.txt

And now... how this is looks like in case i made:

File sculpt_plane_bug.blend

And how to reproduce:
Mesh is generated by protected the geometry nodes group.
And now i can not reproduce the issue, but in the main.

I see 2 fair way to do the things:

You want to improve tool that is not work correct (have bug) and you made fix of bug as part of the improvement of the speed.
You see the bug, and fix this separately.

But i don't see the property of changing the code to improve the speed, and changing the old bug to other one new.
Even more, i don't see how this can be accepted if you don't want to use real atomic types to protect at least hash table and just choose the int. Yeah, exactly there is the cost of correctness.

And last thing i noticed, you still not fix bitvector data race which is was pointed by Hans. So, there is 3 data race in just one PR.

In last half of year i have made several PR's related with Shortest Edge Path node:

Honestly, i am not really deep in the logic of geodesic path computation, so can't really describe the things, but i can see some relations with A*.
I finished by the fact that there is no more memory bandwidth in the speed limitation, next improvements can be only in multithreading or assembler. Second one of the not something real, so first is the most potential.
But this is the wall for me, to have multithreaded A* without a lot of memory usage and stable without overhead.
If geodesic field can be solved by the same way, this probably will be the most fastest implementation.
For the start, you can try to reduce memory bandwidth in the hot loop and move thread-unrelated operations out of loop (for example, prepare some data in the parallel loop before sequential one).

First of all: Data race is the bug that no one wouldn't love to fix. And more important is how data race is going to be confirmed by bug triggers. Do you planned to be cause of some bug report which is cannot be confirmed intentionally in the most of the cases. And each month some new user will go in this report to leave their mark and say `blender is bugged`.. If error probability is low this error still can be. And if there is no rule which will define always correct result, error might occur. I glad that you can't find the issue. Good way to force me to spend day to actually reproduce the bug (data race! for last one i have is was cow disease, and i was spend 4 hour to simplify bug file! not to triger, but to just simplify it! hm.. okay actually last one was in disjoin... still very corner case it was). But.. I did that. Here is the diff of the 2 calls of `geodesic_mesh_create`: <video src="/attachments/d8b6fbf8-a5fe-4388-b423-b131bbc8e56b" title="2024-04-13 23-42-39.mp4" controls></video> As we already can see, this is not deterministic algorithm. And the fact of that i can move the mouse and selection will be different actually point on the fact that there is gradient of the difference. And as you can see, texture of the difference is looks like attractor. Than this is distant then its larger i think. You can see the diff in attached file [sculpt_topo_diff_of_diff.txt](/attachments/0f0ad57c-ccce-446f-b2f0-285c8157781a) And now... how this is looks like in case i made: ![image](/attachments/1ad22289-96bc-4365-88dc-7376adffa721) File [sculpt_plane_bug.blend](/attachments/84e5e7f5-cca7-464a-a8f7-86a29bd90ad5) And how to reproduce: <video src="/attachments/b8b6da06-0b0f-43b5-b9a1-b08ab26b46b3" title="2024-04-14 00-32-47.mp4" controls></video> _Mesh is generated by protected the geometry nodes group._ And now i can not reproduce the issue, but in the main. I see 2 fair way to do the things: 1. You want to improve tool that is not work correct (have bug) and you made fix of bug as part of the improvement of the speed. 2. You see the bug, and fix this separately. But i don't see the property of changing the code to improve the speed, and changing the old bug to other one new. Even more, i don't see how this can be accepted if you don't want to use real atomic types to protect at _least hash table_ and just choose the int. Yeah, exactly there is the cost of correctness. And last thing i noticed, you still not fix bitvector data race which is was pointed by Hans. So, there is 3 data race in just one PR. In last half of year i have made several PR's related with `Shortest Edge Path` node: - https://projects.blender.org/blender/blender/pulls/114707 - https://projects.blender.org/blender/blender/pulls/114683 - https://projects.blender.org/blender/blender/pulls/114684 - https://projects.blender.org/blender/blender/pulls/114733 Honestly, i am not really deep in the logic of geodesic path computation, so can't really describe the things, but i can see some relations with A*. I finished by the fact that there is no more memory bandwidth in the speed limitation, next improvements can be only in multithreading or assembler. Second one of the not something real, so first is the most potential. But this is the wall for me, to have multithreaded A* without a lot of memory usage and stable without overhead. If geodesic field can be solved by the same way, this probably will be the most fastest implementation. For the start, you can try to reduce memory bandwidth in the hot loop and move thread-unrelated operations out of loop (for example, prepare some data in the parallel loop before sequential one).

sculpt_plane_bug.blend

28 MiB

sculpt_topo_diff_of_diff.txt

3.8 KiB

2024-04-14 00-32-47.mp4

4.0 MiB

2024-04-13 23-42-39.mp4

15 MiB

image.png

37 KiB

👍 3

Raul Fernandez Hernandez commented

2024-04-14 03:45:44 +02:00

@mod_moder Let me start by thanking you. This is a really convincing explanation with facts I can use to improve upon and not a battle of egos that PR reviews tends to degenerate on.
I will reopen the PR and will address the open issues.

@mod_moder Let me start by thanking you. This is a really convincing explanation with facts I can use to improve upon and not a battle of egos that PR reviews tends to degenerate on. I will reopen the PR and will address the open issues.

❤️ 5

Raul Fernandez Hernandez reopened this pull request

2024-04-14 03:45:47 +02:00

Sergey Sharybin requested changes 2024-04-16 10:55:09 +02:00

Sergey Sharybin left a comment

I was testing this change on Mac Studio with M2 Ultra (24 cores). The time on the monoHD.blend before this patch is 5.8 seconds to execute geodesic_mesh_create(). With ideal threading code we should be able to lower it down to 240 ms (2.8 / 24).

With the current state of the PR, the timing went down to approximately 800 ms. It is a big change. Although, still feels there is room for improvement.

For the approach itself, I do not think we should be allowing algorithms which rely on race conditions. Ideally, all algorithms we implement in Blender are deterministic, otherwise troubleshooting will became a pure nightmare. There are some exceptions, like GPU rendering which is not strictly deterministic, because samples could be accumulated in different order, but that was a conscious decision.

I've tried a couple of different approaches here, with the general idea of letting threads to operate purely on local data, and then somehow accumulate results of their work. In this case the result of each thread was a Set of new edges it wants to append to the queue.

I've tried to use parallel_reduce which combines Set's. I forgot exact speed, but it was way slower than the proposed PR. Possibly because of a lot of allocations or other memory access of temporary Set's. Maybe there is a different way of getting it to work, I didn't have much time to try harder.

What I found works the best is making it so threads use Set from TLS, and combine TLS "manually" after the parallel loop. This seems to avoid the most of the overhead, and it actually lowered the overall time of geodesic_mesh_create() to about 300 ms. This is way closer to the ideal scaling, but also a bit suspecious, so second pair of eyes would be very much welcome.

I've attached the modified file. It is a bit of a messy state with both PR and my modifications side-by-side.It should be relatively easy to clean it up, and verify correctness.

I was testing this change on Mac Studio with M2 Ultra (24 cores). The time on the `monoHD.blend` before this patch is 5.8 seconds to execute `geodesic_mesh_create()`. With ideal threading code we should be able to lower it down to 240 ms (2.8 / 24). With the current state of the PR, the timing went down to approximately 800 ms. It is a big change. Although, still feels there is room for improvement. For the approach itself, I do not think we should be allowing algorithms which rely on race conditions. Ideally, all algorithms we implement in Blender are deterministic, otherwise troubleshooting will became a pure nightmare. There are some exceptions, like GPU rendering which is not strictly deterministic, because samples could be accumulated in different order, but that was a conscious decision. I've tried a couple of different approaches here, with the general idea of letting threads to operate purely on local data, and then somehow accumulate results of their work. In this case the result of each thread was a `Set` of new edges it wants to append to the queue. I've tried to use `parallel_reduce` which combines `Set`'s. I forgot exact speed, but it was way slower than the proposed PR. Possibly because of a lot of allocations or other memory access of temporary `Set`'s. Maybe there is a different way of getting it to work, I didn't have much time to try harder. What I found works the best is making it so threads use `Set` from TLS, and combine TLS "manually" after the parallel loop. This seems to avoid the most of the overhead, and it actually lowered the overall time of `geodesic_mesh_create()` to about 300 ms. This is way closer to the ideal scaling, but also a bit suspecious, so second pair of eyes would be very much welcome. I've attached the modified file. It is a bit of a messy state with both PR and my modifications side-by-side.It should be relatively easy to clean it up, and verify correctness.

sculpt_geodesic.cc

15 KiB

👍 1

source/blender/editors/sculpt_paint/sculpt_expand.cc

						
				@ -2141,6 +2143,11 @@ static int sculpt_expand_invoke(bContext *C, wmOperator *op, const wmEvent *even

				    return OPERATOR_CANCELLED;

				  }

				  const int totvert = SCULPT_vertex_count_get(ss);

Sergey Sharybin commented

2024-04-16 10:30:55 +02:00

Think this could also be split-off to a separate PR. Or, maybe even landed directly, as it is a very minor and obvious cleanup.

farsthary marked this conversation as resolved

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -50,3 +50,3 @@

				  }

				  BLI_assert(dists[v1] != FLT_MAX);

				  BLI_assert(dists[v1] != std::numeric_limits<float>::max());

Sergey Sharybin commented

2024-04-16 10:32:39 +02:00

I am not sure how I feel about it. The code becomes much longer, and when we know the type at a compile time I am not sure we are benefitting from numeric_limits. If the type was templated then sure, we'd have to use numeric_limits.

To me FLT_MAX reads much much easier.
But I am also curious what others think of it.

I am not sure how I feel about it. The code becomes much longer, and when we know the type at a compile time I am not sure we are benefitting from `numeric_limits`. If the type was templated then sure, we'd have to use `numeric_limits`. To me `FLT_MAX` reads much much easier. But I am also curious what others think of it.

👍 1

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -77,16 +77,33 @@ static bool sculpt_geodesic_mesh_test_dist_add(Span<float3> vert_positions,

				  return false;

				}

				/* Open addressing with linear probing that minimizes

Sergey Sharybin commented

2024-04-16 10:33:36 +02:00

We do have line limit of 99. There is no need to wrap lines so early. In a way, I think using all available horizontal space makes comment easier to follow.

collission -> collision

We do have line limit of 99. There is no need to wrap lines so early. In a way, I think using all available horizontal space makes comment easier to follow. `collission` -> `collision`

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -161,1 +186,3 @@

				  }

				  /* queue and queue_next should not differ by a huge amount of size

				   * since they are an advancing front in the mesh hence no need

				   * to allocate much more memmory and keep the iteration range smaller */

Sergey Sharybin commented

2024-04-16 10:36:25 +02:00

memmory -> memory

`memmory` -> `memory`

Raul Fernandez Hernandez added 1 commit 2024-04-16 11:06:52 +02:00

thread safety 63364a9fef

Raul Fernandez Hernandez commented

2024-04-16 11:14:18 +02:00

Thanks @Sergey I just pushed my changes before seeing your review and code. What I found in my thread safe implementation is that is actually slower than the single threaded one :( adding atomics for the sculpt_geodesic_mesh_test_dist_add() and using tbb::concurrent_vector imply an overhead that erase the good speed that the unsafe code had 😅.
Will check your implementation now.

For comparison on an M1 MBA
ST

Mono
ST Bottleneck: time: 0.225167 seconds
ST Bottleneck: time: 0.203988 seconds
MonoHD
ST Bottleneck: time: 6.95426 seconds <----
Plane
ST Bottleneck: time: 0.252023 seconds

SMT

Mono
SMT Bottleneck: time: 0.271475 seconds
SMT Bottleneck: time: 0.224629 seconds
MonoHD
SMT Bottleneck: time: 10.3282 seconds <------
Plane
SMT Bottleneck: time: 0.281746 seconds

Thanks @Sergey I just pushed my changes before seeing your review and code. What I found in my thread safe implementation is that is actually slower than the single threaded one :( adding atomics for the sculpt_geodesic_mesh_test_dist_add() and using tbb::concurrent_vector imply an overhead that erase the good speed that the unsafe code had 😅. Will check your implementation now. For comparison on an M1 MBA ST Mono ST Bottleneck: time: 0.225167 seconds ST Bottleneck: time: 0.203988 seconds MonoHD ST Bottleneck: time: 6.95426 seconds <---- Plane ST Bottleneck: time: 0.252023 seconds SMT Mono SMT Bottleneck: time: 0.271475 seconds SMT Bottleneck: time: 0.224629 seconds MonoHD SMT Bottleneck: time: 10.3282 seconds <------ Plane SMT Bottleneck: time: 0.281746 seconds

Iliya Katushenock reviewed 2024-04-16 11:21:46 +02:00

source/blender/editors/sculpt_paint/sculpt_geodesic.cc Outdated

						
				@ -52,2 +65,2 @@

				  BLI_assert(dists[v1] != FLT_MAX);

				  if (dists[v0] <= dists[v1]) {

				  BLI_assert(dists[v1].load() != std::numeric_limits<float>::max());

				  if (dists[v0].load() <= dists[v1].load()) {

Iliya Katushenock commented

2024-04-16 11:21:46 +02:00

Before you start moving forward with this, i'd describe why this doesn't work:
std::atomic::load is not something magic. Only thing that is protected now is that compiler and CPU will not reorder loadings of both floats (for example).
But someone else still can write something in dists[v1] between assert and condition of if statement.
This is why i link atomic/compare_exchange in my second review.
So, you have to check if value is as you expect and try to rewrite this. But if result is false, you have to try again.

And this even not the fix of issue. Some other thread might find local minimum and write this in dists[v0]. And current thread, as more small local minimum, should change this value.
And now you have to rewrite them. This is additional overhead that i talk about in last review.

Before you start moving forward with this, i'd describe why this doesn't work: `std::atomic::load` is not something magic. Only thing that is protected now is that compiler and CPU will not reorder loadings of both floats (for example). But someone else still can write something in `dists[v1]` between assert and condition of `if` statement. This is why i link `atomic/compare_exchange` in my second review. So, you **have** to check if value is as you expect and try to rewrite this. But if result is false, you **have** to try again. And this even not the fix of issue. Some other thread might find local minimum and write this in `dists[v0]`. And current thread, as more small local minimum, should change this value. And now you have to rewrite them. This is additional overhead that i talk about in last review.

Raul Fernandez Hernandez commented

2024-04-16 11:38:58 +02:00

@mod_moder You're right, if even now is getting slower than the single threaded version, adding more thread safety will increase the overhead. When I closed the PR on Saturday was mainly because I realized the performance gain was at the expense of thread safety and I was lucky so far to not encounter a test that showed the error. (even though it looked like a tantrum 😅)

I reopened it later because maybe it could be saved with other data containers I may not be aware that Blender already have. But I'm getting more convinced that only an algorithmic refactor of the expand operator is what will bring true and safe performance gain (and scalability) and this route of just adding multithreading on top of the existing algorithm is a dead end.

The geodesic calculation is just too dependent on previous state and not heavy enough to offset the overhead from thread safety.
If all agree I can close it as it is and attempt a better refactor or try few more rounds with @Sergey 's patch

@mod_moder You're right, if even now is getting slower than the single threaded version, adding more thread safety will increase the overhead. When I closed the PR on Saturday was mainly because I realized the performance gain was at the expense of thread safety and I was lucky so far to not encounter a test that showed the error. (even though it looked like a tantrum 😅) I reopened it later because maybe it could be saved with other data containers I may not be aware that Blender already have. But I'm getting more convinced that only an algorithmic refactor of the expand operator is what will bring true and safe performance gain (and scalability) and this route of just adding multithreading on top of the existing algorithm is a dead end. The geodesic calculation is just too dependent on previous state and not heavy enough to offset the overhead from thread safety. If all agree I can close it as it is and attempt a better refactor or try few more rounds with @Sergey 's patch

👍 1

Jacques Lucke commented

2024-04-16 11:55:41 +02:00

@Sergey your code has a bug unfortunately: Instead of is_edge_scheduled_map.clear() you need something like is_edge_scheduled_map.fill(false).

I did a little test myself: https://projects.blender.org/blender/blender/compare/main...JacquesLucke:geodesic-speedup-test
It speeds up geodesic_mesh_create from 10.3 ms to 1.5 ms for me (12 cores, 24 threads).

@Sergey your code has a bug unfortunately: Instead of `is_edge_scheduled_map.clear()` you need something like `is_edge_scheduled_map.fill(false)`. I did a little test myself: https://projects.blender.org/blender/blender/compare/main...JacquesLucke:geodesic-speedup-test It speeds up `geodesic_mesh_create` from `10.3 ms` to `1.5 ms` for me (12 cores, 24 threads).

🚀 1

Raul Fernandez Hernandez commented

2024-04-16 12:06:10 +02:00

Yup, will give @Sergey 's code a shot, if is fully thread safe then it will be very worth it to save this PR ;)

Jacques Lucke commented

2024-04-16 12:17:50 +02:00

This is data race if you call this function in parallel loop. Do sculpt_geodesic_mesh_test_dist_add perform read/modify operations on dists in random access manner.

Hmm, true. Just noticed that too. I don't really have a good solution here. In my previous comment I was also only focused on speeding up the queue. sculpt_geodesic_mesh_test_dist_add seems to be a bigger issue. Would need to look into the entire algorithm more to better understand if and how this can be solved.

> This is data race if you call this function in parallel loop. Do `sculpt_geodesic_mesh_test_dist_add` perform read/modify operations on dists in random access manner. Hmm, true. Just noticed that too. I don't really have a good solution here. In my previous comment I was also only focused on speeding up the queue. `sculpt_geodesic_mesh_test_dist_add` seems to be a bigger issue. Would need to look into the entire algorithm more to better understand if and how this can be solved.

Sergey Sharybin commented

2024-04-16 12:58:30 +02:00

@JacquesLucke Ah, true, the clear() was doing something else. So from the not-so-much-algorithmical-changes my change indeed "failed".

However, why would we want to schedule edge again? Intuitively, it leads to extra calculations of things which are already calculated, with some possibility of early output due to things like dists[ev_other] != FLT_MAX. So if we do not clear the bitmap, we avoid those extra calculations, lowering the overall time?

And yeah, the sculpt_geodesic_mesh_test_dist_add i did not notice before you mentioned it. Maybe there is some purely technical exercise of adding some atomics could work there, or maybe we'd need to find a different formulation of the entire algorithm. Not suer yet.

@JacquesLucke Ah, true, the `clear()` was doing something else. So from the not-so-much-algorithmical-changes my change indeed "failed". However, why would we want to schedule edge again? Intuitively, it leads to extra calculations of things which are already calculated, with some possibility of early output due to things like `dists[ev_other] != FLT_MAX`. So if we do not clear the bitmap, we avoid those extra calculations, lowering the overall time? And yeah, the `sculpt_geodesic_mesh_test_dist_add ` i did not notice before you mentioned it. Maybe there is some purely technical exercise of adding some atomics could work there, or maybe we'd need to find a different formulation of the entire algorithm. Not suer yet.

Jacques Lucke commented

2024-04-16 13:06:36 +02:00

However, why would we want to schedule edge again?

I don't know for sure. I assume that's part of the algorithm. The shortest path based on the topology may be longer than the shortest geodesic path. In this case it feels like the same edge may need to be processed more than once.

I think your previous solution actually implemented the behavior where the bits were not cleared. When I tried it, it worked well in some cases, but just didn't work on others.

> However, why would we want to schedule edge again? I don't know for sure. I assume that's part of the algorithm. The shortest path based on the topology may be longer than the shortest geodesic path. In this case it feels like the same edge may need to be processed more than once. I think your previous solution actually implemented the behavior where the bits were not cleared. When I tried it, it worked well in some cases, but just didn't work on others.

Raul Fernandez Hernandez added 2 commits 2024-04-21 20:05:24 +02:00

wip 1999f90a5d

wip f18c9ef9c9

Raul Fernandez Hernandez added 2 commits 2024-04-22 05:07:50 +02:00

expand in place experiment 30048a74a8

fix fc251c6174

Raul Fernandez Hernandez added 1 commit 2024-04-22 05:31:53 +02:00

optim f452b2f886

Raul Fernandez Hernandez commented

2024-04-22 05:45:47 +02:00

Ok based on @Sergey modified code I've added safeguard for updating the dists array and still comes
ahead in performance to the serial implementation.
@mod_moder please take a look and if we all agree I can start pushing for final reviews.

Ok based on @Sergey modified code I've added safeguard for updating the **dists** array and still comes ahead in performance to the serial implementation. @mod_moder please take a look and if we all agree I can start pushing for final reviews.

Raul Fernandez Hernandez added 1 commit 2024-04-22 13:11:27 +02:00

Revert changes in tests/data directory 126ade90bb

Jacques Lucke reviewed 2024-04-22 13:24:01 +02:00

source/blender/editors/sculpt_paint/sculpt_geodesic.cc

						
				@ -74,0 +80,4 @@

				     * the actual order of update is irrelevant here */

				    std::atomic<float> &atomic_dist_v0 = reinterpret_cast<std::atomic<float> &>(dists[v0]);

				    if (atomic_dist_v0.load() > dist0) {

				      atomic_dist_v0.store(dist0);

Jacques Lucke commented

2024-04-22 13:24:01 +02:00

Note that there is still a race-condition here. Another thread may change atomic_dist_v0 between the load and store.
Also, there is undefined behavior in every access of the dists array (one thread may read a value from it that another thread is writing at the same time). You might be able to get a way with std::memory_order_relaxed in a few places though, haven't checked in detail yet.

Note that there is still a race-condition here. Another thread may change `atomic_dist_v0` between the load and store. Also, there is undefined behavior in every access of the `dists` array (one thread may read a value from it that another thread is writing at the same time). You might be able to get a way with `std::memory_order_relaxed` in a few places though, haven't checked in detail yet.

👍 1

Iliya Katushenock commented

2024-04-22 13:30:30 +02:00

As i said in #120125 (comment), this is UB just because there is cast of float to atomic float.

As i said in https://projects.blender.org/blender/blender/pulls/120125#issuecomment-1172446, this is UB just because there is cast of float to atomic float.

Jacques Lucke commented

2024-04-22 13:33:40 +02:00

It is, but that's not really the core of the problem here currently imo.

Iliya Katushenock commented

2024-04-22 13:41:10 +02:00

Well, that true. I start thinking that this is not the best direction to go. Might it be better to research for some scientific paper with multithreaded geodesic distant field at begin and stop try to just puting another one synchronization primitive in new place.

👍 1

Raul Fernandez Hernandez commented

2024-04-22 14:48:12 +02:00

Closing this PR since the attempts are a dead end as I previously mentioned. No point spending more time in research since the approach of calculating a whole mesh field does not scale well anyways even with multithreading.
The expand operator needs a deeper refactor than that.
Thanks everyone for their feedback.

Closing this PR since the attempts are a dead end as I previously mentioned. No point spending more time in research since the approach of calculating a whole mesh field does not scale well anyways even with multithreading. The expand operator needs a deeper refactor than that. Thanks everyone for their feedback.