Geometry Nodes: use lazy evaluation in repeat zone #112421

Merged
Jacques Lucke merged 26 commits from JacquesLucke/blender:lazy-repeat-zone into main 2023-09-22 08:58:26 +02:00
Member

The goal is to make the evaluation of repeat zones more efficient by making use of the lazy-function evaluation system. Performance is improved in two ways:

  • Unnecessary nodes are not evaluated anymore. E.g. if a repeat zone outputs two geometries but only one of those is actually used, the other one will not be computed anymore.
  • Support evaluating different iteration indices at the same time on different threads.

It is possible that some uses of repeat zones become slower with this refactor, especially when each iteration does very little work and there are a lot of iterations. The old implementation was not optimized for this use case either but now there is a bit more constant overhead per iteration than before.

On the bright side, this change can result in some very significant speedups.

  • In this setup, an ico sphere is generated in every iteration and then all iterations are joined as instances (ico sphere is chosen, because it is currently single threaded; if it was multi-threaded, the speedup might have been less significant). Execution time improves from 800ms to 70ms because different ico-spheres are generated in parallel.
    image
  • In this setup, there are two output geometries, only one of which is used. With the old eager evaluation method, both outputs are computed which is expensive. Now only the output that is actually used is computed. This setup can be become arbitrarily faster.
    image
  • In this setup, the subdivided cube is only required when the number of iterations is exactly 10. With the old eager evaluation method, the subdivided cube was always computed. Now it is only computed when it is actually needed.
    image

There is one todo comment for adding back-links for socket usages. Properly linking those up can result in better (shorter) life-times for anonymous attributes. Even without that, performance is already better than before.

The image below shows how the generated graph looks like with three iterations. Note that there are three loop body nodes which are chained. In the final implementation, there will also be backlinks for the socket usages (which currently have a 1 input). The Logical Or on the right outputs true when any of the loop bodies uses a border link (a link that passes data directly through the zone border, instead of the repeat input node).
image

Building a lazy-function graph makes it possible to use the lazyness and multi-threading features of the lazy-function graph executor. This is much easier than reimplementing this behavior for repeat zones specifically (hence there was only single-threaded eager execution before).

The goal is to make the evaluation of repeat zones more efficient by making use of the lazy-function evaluation system. Performance is improved in two ways: * Unnecessary nodes are not evaluated anymore. E.g. if a repeat zone outputs two geometries but only one of those is actually used, the other one will not be computed anymore. * Support evaluating different iteration indices at the same time on different threads. It is possible that some uses of repeat zones become slower with this refactor, especially when each iteration does very little work and there are a lot of iterations. The old implementation was not optimized for this use case either but now there is a bit more constant overhead per iteration than before. On the bright side, this change can result in some very significant speedups. * In this setup, an ico sphere is generated in every iteration and then all iterations are joined as instances (ico sphere is chosen, because it is currently single threaded; if it was multi-threaded, the speedup might have been less significant). Execution time improves from `800ms` to `70ms` because different ico-spheres are generated in parallel. ![image](/attachments/09d8b290-fe05-4705-850e-3fcca59b0608) * In this setup, there are two output geometries, only one of which is used. With the old eager evaluation method, both outputs are computed which is expensive. Now only the output that is actually used is computed. This setup can be become arbitrarily faster. ![image](/attachments/53b0f08d-4f98-4e4d-9143-6e809cd0b307) * In this setup, the subdivided cube is only required when the number of iterations is exactly 10. With the old eager evaluation method, the subdivided cube was always computed. Now it is only computed when it is actually needed. ![image](/attachments/51b3e542-2bcf-411d-8817-a011b8cc2965) There is one todo comment for adding back-links for socket usages. Properly linking those up can result in better (shorter) life-times for anonymous attributes. Even without that, performance is already better than before. The image below shows how the generated graph looks like with three iterations. Note that there are three loop body nodes which are chained. In the final implementation, there will also be backlinks for the socket usages (which currently have a `1` input). The `Logical Or` on the right outputs true when any of the loop bodies uses a border link (a link that passes data directly through the zone border, instead of the repeat input node). ![image](/attachments/a12ece11-4f95-4ef5-ad74-c72570804657) Building a lazy-function graph makes it possible to use the lazyness and multi-threading features of the lazy-function graph executor. This is much easier than reimplementing this behavior for repeat zones specifically (hence there was only single-threaded eager execution before).
Jacques Lucke added 7 commits 2023-09-15 17:57:24 +02:00
Jacques Lucke added 3 commits 2023-09-15 18:36:39 +02:00
buildbot/vexp-code-patch-coordinator Build done. Details
db0ce7ecb2
cleanup
Author
Member

@blender-bot build

@blender-bot build
Jacques Lucke requested review from Lukas Tönne 2023-09-15 19:07:44 +02:00
Jacques Lucke requested review from Hans Goudey 2023-09-15 19:07:44 +02:00
Jacques Lucke added 2 commits 2023-09-16 13:13:24 +02:00
Iliya Katushenock reviewed 2023-09-16 15:34:41 +02:00
@ -1496,0 +1578,4 @@
void *graph_executor_storage = nullptr;
bool multi_threading_enabled = false;
Vector<int> input_index_map;
Vector<int> output_index_map;

Can it be IndexRange instead of Vector, filled by iota?

Can it be `IndexRange` instead of `Vector`, filled by `iota`?
Author
Member

Note that the output_index_map is not an IndexRange, the input map could be one in theory though.

Note that the `output_index_map` is not an `IndexRange`, the input map could be one in theory though.
Author
Member

Main reason I didn't do this before is that I was contemplating to make this index-mapped-params a more general thing, but that doesn't seem necessary currently.

Main reason I didn't do this before is that I was contemplating to make this index-mapped-params a more general thing, but that doesn't seem necessary currently.
mod_moder marked this conversation as resolved
@ -1566,0 +1753,4 @@
}
/**
* Generate a lazy-function graph that contains contains the loop body (`body_fn_`) as many times

Typo: contains contains

Typo: `contains contains`
JacquesLucke marked this conversation as resolved
@ -1671,0 +1791,4 @@
VectorSet<lf::FunctionNode *> &lf_body_nodes = eval_storage.lf_body_nodes;
for ([[maybe_unused]] const int i : IndexRange(iterations)) {
lf::FunctionNode &lf_node = lf_graph.add_function(body_fn_);
eval_storage.lf_body_nodes.add_new(&lf_node);

Shorter alias: lf_body_nodes.add_new(&lf_node);;

Shorter alias: `lf_body_nodes.add_new(&lf_node);`;
JacquesLucke marked this conversation as resolved
Jacques Lucke added 2 commits 2023-09-16 15:53:22 +02:00
Jacques Lucke added 1 commit 2023-09-16 18:44:16 +02:00
Jacques Lucke added 1 commit 2023-09-16 18:51:39 +02:00
Jacques Lucke added 1 commit 2023-09-16 18:59:01 +02:00
Jacques Lucke added 1 commit 2023-09-17 10:09:55 +02:00
Jacques Lucke added 2 commits 2023-09-17 14:06:20 +02:00
Author
Member

@blender-bot build

@blender-bot build
Jacques Lucke added 1 commit 2023-09-17 15:50:12 +02:00
Jacques Lucke added 1 commit 2023-09-17 19:12:05 +02:00
Hans Goudey reviewed 2023-09-18 18:15:59 +02:00
Hans Goudey left a comment
Member

I find the need for RepeatBodyNodeExecuteWrapper a bit confusing. I know I'm just missing things, but I imagined that wrapping inside things wouldn't be necessary because there is a graph generated with the right number of iterations. I guess it's just needed to support logging a specific iteration?

Other than that, this change seems straightforward actually.

I find the need for `RepeatBodyNodeExecuteWrapper` a bit confusing. I know I'm just missing things, but I imagined that wrapping inside things wouldn't be necessary because there is a graph generated with the right number of iterations. I guess it's just needed to support logging a specific iteration? Other than that, this change seems straightforward actually.
@ -131,2 +131,4 @@
struct GeoNodesSideEffectNodes {
MultiValueMap<ComputeContextHash, const lf::FunctionNode *> nodes_by_context;
/**
* The repeat zone is identified by the compute context and the identifier of the repeat output
Member

I guess it's the compute context of the parent of the repeat zone? Might be worth clarifying

I guess it's the compute context of the _parent_ of the repeat zone? Might be worth clarifying
JacquesLucke marked this conversation as resolved
@ -1643,0 +1815,4 @@
eval_storage.graph_executor.emplace(lf_graph,
lf_graph_inputs,
lf_graph_outputs,
Member

std::move?

`std::move`?
JacquesLucke marked this conversation as resolved
Jacques Lucke added 4 commits 2023-09-20 10:20:53 +02:00
Author
Member

I know I'm just missing things, but I imagined that wrapping inside things wouldn't be necessary because there is a graph generated with the right number of iterations. I guess it's just needed to support logging a specific iteration?

Updated the comment. I think the main thing you might be missing is that, while we create a graph with a node per iteration, all these nodes use the same LazyFunction internally.

> I know I'm just missing things, but I imagined that wrapping inside things wouldn't be necessary because there is a graph generated with the right number of iterations. I guess it's just needed to support logging a specific iteration? Updated the comment. I think the main thing you might be missing is that, while we create a graph with a node per iteration, all these nodes use the same `LazyFunction` internally.
Author
Member

@blender-bot build

@blender-bot build
Hans Goudey approved these changes 2023-09-21 22:16:12 +02:00
Hans Goudey left a comment
Member

but now there is a bit more overhead than before.

It might be nice to add that the overhead is proportional to the number of iterations, since I think that's how the algorithmic complexity is different than before.

Probably a stupid question, but just to be sure, the iterations in iterations_by_repeat_zone don't cause repeat zones that otherwise wouldn't be executed to be executed, right? Based on the code it looks fine.

>but now there is a bit more overhead than before. It might be nice to add that the overhead is proportional to the number of iterations, since I _think_ that's how the algorithmic complexity is different than before. Probably a stupid question, but just to be sure, the iterations in `iterations_by_repeat_zone` don't cause repeat zones that otherwise wouldn't be executed to be executed, right? Based on the code it looks fine.
Author
Member

Not a stupid question. The repeat zone will not execute more iterations than what is passed into the Iterations input. However, if the repeat zone contains a side effect node such as a viewer, some iterations might evaluated even if the final output of the repeat zone is not used.

Not a stupid question. The repeat zone will not execute more iterations than what is passed into the Iterations input. However, if the repeat zone contains a side effect node such as a viewer, some iterations might evaluated even if the final output of the repeat zone is not used.
Jacques Lucke merged commit dacb768cb6 into main 2023-09-22 08:58:26 +02:00
Jacques Lucke deleted branch lazy-repeat-zone 2023-09-22 08:58:27 +02:00
First-time contributor

Does this threading means that it is possible to have the kind of iteration performance like in blur attribute node? Or is entirely different thing?

Does this threading means that it is possible to have the kind of iteration performance like in blur attribute node? Or is entirely different thing?
Author
Member

It's kind of unrelated. Whether you can benefit from multi-threading across multiple iterations in the repeat zone depends on your node tree. If the next iteration depends on the previous iteration (as it often does), things have to be done one after the other. Within individual nodes there is still multi-threading of course.

It's kind of unrelated. Whether you can benefit from multi-threading across multiple iterations in the repeat zone depends on your node tree. If the next iteration depends on the previous iteration (as it often does), things have to be done one after the other. Within individual nodes there is still multi-threading of course.
First-time contributor

After stress testing repeat zone with zero operation inside it (the latest build after this pr merged and blender 4 alpha from back august) both suffer fps drop to 3.7 fps compared to using blur attribute node 70+ fps at 100000 iteration. (blur node get the same fps drop only after 4,000,000 iteration)
I believe this is worrisome if we are to build a performant simulation system
image

Edit:
It turns out that this is not specific to repeat zone but general serial operation in geometry nodes. I manually created 100000 connection with zero operation and this also do severe fps drop to about 5 fps
image

After stress testing repeat zone with zero operation inside it (the latest build after this pr merged and blender 4 alpha from back august) both suffer fps drop to 3.7 fps compared to using blur attribute node 70+ fps at 100000 iteration. (blur node get the same fps drop only after 4,000,000 iteration) I believe this is worrisome if we are to build a performant simulation system ![image](/attachments/381a4184-46aa-43a3-adf1-9dc53d91193e) Edit: It turns out that this is not specific to repeat zone but general serial operation in geometry nodes. I manually created 100000 connection with zero operation and this also do severe fps drop to about 5 fps ![image](/attachments/d661d147-4af2-4194-a06e-8a6792db4af1)
Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#112421
No description provided.