Previously, this happened when the "node task" first runs, which might
not actually execute the node if there are missing inputs. Deferring the
allocation of storage and default inputs allows for better memory reuse
later (currently the memory is not reused).
This is part of D16858. Iterating over all field inputs allows us to extract
all anonymous attributes used by a field relatively easily which is necessary
for D16858.
This could potentially be used for better field tooltips for nested fields,
but that needs further investigation.
This was an oversight in rBdba2d828462ae22de5.
The evaluator uses multiple threads to initialize node states
but it is still in single threaded mode.
`get_main_or_local_allocator` did not return the right allocator
in this case.
The geometry nodes evaluator supports "lazy threading", i.e. it starts out
single-threaded. But when it determines that multi-threading can be
benefitial, it switches to multi-threaded mode.
Now it only creates an enumerable-thread-specific if it is actually using
multiple threads. This results in a 6% speedup in my test file with many
node groups and math nodes.
Lazy-function graphs are now evaluated properly even if they contain
cycles. Note that cycles are only ok if there is no data dependency cycle.
For example, a node might output something that is fed back into itself.
As long as the output can be computed without the input that it feeds into,
everything is ok.
The code that builds the graph is responsible for making sure that there
are no actual data dependencies.
Nodes that were not connected to any output could still impact performance.
While they were never executed, sometimes their inputs could keep references
to geometries that other nodes want to modify. That caused unnecessary geometry
copies, because a geometry can only be modified if it is not shared.
Now, inputs that will never be used are tagged accordingly and they will never
have references to geometries that others might want to modify.
* Support bidirectional type lookups. E.g. finding the base type of a
field was supported, but not the other way around. This also removes
the todo in `get_vector_type`. To achieve this, types have to be
registered up-front.
* Separate `CPPType` from other "type traits". For example, previously
`ValueOrFieldCPPType` adds additional behavior on top of `CPPType`.
Previously, it was a subclass, now it just contains a reference to the
`CPPType` it corresponds to. This follows the composition-over-inheritance
idea. This makes it easier to have self-contained "type traits" without
having to put everything into `CPPType`.
Differential Revision: https://developer.blender.org/D16479
This is the conventional way of dealing with unused arguments in C++,
since it works on all compilers.
Regex find and replace: `UNUSED\((\w+)\)` -> `/*$1*/`
In large node setup the threading overhead was sometimes very significant.
That's especially true when most nodes do very little work.
This commit improves the scheduling by not using multi-threading in many
cases unless it's likely that it will be worth it. For more details see the comments
in `BLI_lazy_threading.hh`.
Differential Revision: https://developer.blender.org/D15976
This refactors the geometry nodes evaluation system. No changes for the
user are expected. At a high level the goals are:
* Support using geometry nodes outside of the geometry nodes modifier.
* Support using the evaluator infrastructure for other purposes like field evaluation.
* Support more nodes, especially when many of them are disabled behind switch nodes.
* Support doing preprocessing on node groups.
For more details see T98492.
There are fairly detailed comments in the code, but here is a high level overview
for how it works now:
* There is a new "lazy-function" system. It is similar in spirit to the multi-function
system but with different goals. Instead of optimizing throughput for highly
parallelizable work, this system is designed to compute only the data that is actually
necessary. What data is necessary can be determined dynamically during evaluation.
Many lazy-functions can be composed in a graph to form a new lazy-function, which can
again be used in a graph etc.
* Each geometry node group is converted into a lazy-function graph prior to evaluation.
To evaluate geometry nodes, one then just has to evaluate that graph. Node groups are
no longer inlined into their parents.
Next steps for the evaluation system is to reduce the use of threads in some situations
to avoid overhead. Many small node groups don't benefit from multi-threading at all.
This is much easier to do now because not everything has to be inlined in one huge
node tree anymore.
Differential Revision: https://developer.blender.org/D15914
This potentially overallocates buffers so that they are usable
for more data types, which allows buffers to be reused more
easily. That leads to fewer separate allocations and improved
cache usage (in one of my test files the number of separate
allocations went down from 1826 to 1555).
This makes calculation of selected indices slightly faster when the
input is a virtual array (the direct output of various nodes like
Face Area, etc). The utility can be helpful for other areas that
need to find selected indices besides field evaluation.
With the face area node used as a selection with 4 million faces,
the speedup is 3.51 ms to 3.39 ms, just a slight speedup.
Differential Revision: https://developer.blender.org/D15127
This improves performance of the procedure executor on secondary metrics
(i.e. not for the main use case when many elements are processed together,
but for the use case when a single element is processed at a time).
In my benchmark I'm measuring a 50-60% improvement:
* Procedure with a single function (executed many times): `5.8s -> 2.7s`.
* Procedure with 1000 functions (executed many times): `2.4 -> 1.0s`.
The speedup is mainly achieved in multiple ways:
* Store an `Array` of variable states, instead of a map. The array is indexed
with indices stored in each variable. This also avoids separately allocating
variable states.
* Move less data around in the scheduler and use a `Stack` instead of `Map`.
`Map` was used before because it allows for some optimizations that might
be more important in the future, but they don't matter right now (e.g. joining
execution paths that diverged earlier).
* Avoid memory allocations by giving the `LinearAllocator` some memory
from the stack.
The separate geometry and delete geometry nodes often invert the
selection so that deleting elements from a geometry can be implemented
as copying the opposite selection of elements. This should make the two
nodes faster in some cases, since the generic versions of selection
creation functions (i.e. from d3a1e9cbb9) are used instead
of the single threaded code that was used for this node.
The change also makes the deletion/separation code easier to
understand because it doesn't have to pass around the inversion.
Goals:
* Better high level control over where devirtualization occurs. There is always
a trade-off between performance and compile-time/binary-size.
* Simplify using array devirtualization.
* Better performance for cases where devirtualization wasn't used before.
Many geometry nodes accept fields as inputs. Internally, that means that the
execution functions have to accept so called "virtual arrays" as inputs. Those
can be e.g. actual arrays, just single values, or lazily computed arrays.
Due to these different possible virtual arrays implementations, access to
individual elements is slower than it would be if everything was just a normal
array (access does through a virtual function call). For more complex execution
functions, this overhead does not matter, but for small functions (like a simple
addition) it very much does. The virtual function call also prevents the compiler
from doing some optimizations (e.g. loop unrolling and inserting simd instructions).
The solution is to "devirtualize" the virtual arrays for small functions where the
overhead is measurable. Essentially, the function is generated many times with
different array types as input. Then there is a run-time dispatch that calls the
best implementation. We have been doing devirtualization in e.g. math nodes
for a long time already. This patch just generalizes the concept and makes it
easier to control. It also makes it easier to investigate the different trade-offs
when it comes to devirtualization.
Nodes that we've optimized using devirtualization before didn't get a speedup.
However, a couple of nodes are using devirtualization now, that didn't before.
Those got a 2-4x speedup in common cases.
* Map Range
* Random Value
* Switch
* Combine XYZ
Differential Revision: https://developer.blender.org/D14628
Since {rBeae36be372a6b16ee3e76eff0485a47da4f3c230} the distinction
between float and byte colors is more explicit in the ui. So far, geometry
nodes couldn't really deal with byte colors in general. This patch fixes that.
There is still only one color socket, which contains float colors. Conversion
to and from byte colors is done when read from or writing to attributes.
* Support writing to byte color attributes in Store Named Attribute node.
* Support converting to/from byte color in attribute conversion operator.
* Support propagating byte color attributes.
* Add all the implicit conversions from byte colors to the other types.
* Display byte colors as integers in spreadsheet.
Differential Revision: https://developer.blender.org/D14705
This improves performance e.g. when creating an integer attribute
based on an index field. For 4 million vertices, I measured a speedup
from 3.5 ms to 1.2 ms.
When boolean fields are evaluated and used as selections, we create
a vector of indices. This process is currently single-threaded, but
226f0c4fef added a more optimized multi-threaded version
of this process. It's simple to use this in the field evaluator.
I tested this with the set position node and a random
value node set to boolean mode on a Ryzen 2700x:
| | Before | After | Improvement |
| 10% Selected | 40.5 ms | 29.0 ms | 1.4x |
| 90% Selected | 115 ms | 45.3 ms | 2.5x |
In the future there could be a specialized version for non-span
virtual array selections that uses `materialize` to lower virtual
call overhead.
Differential Revision: https://developer.blender.org/D14436
Use a shorter/simpler license convention, stops the header taking so
much space.
Follow the SPDX license specification: https://spdx.org/licenses
- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile
While most of the source tree has been included
- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
use different header conventions.
doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.
See P2788 for the script that automated these edits.
Reviewed By: brecht, mont29, sergey
Ref D14069
This commit adds infrastructure for 8 bit signed integer attributes.
This can be useful given the discussion in T94193, where we want to
store spline type, Bezier handle type, and other small enums as
attributes.
This is only exposed in the interface in the attribute lists, so it
shouldn't be an option in geometry nodes, at least for now.
I expect that this type won't be used directly very often, it
should mostly be cast to an enum type. However, with support
for 8 bit integers, it also makes sense to add things like mixing
implementations for consistency.
Differential Revision: https://developer.blender.org/D13721
Every translation unit that included the modified headers generated
some extra code, even though it was not used. This adds unnecessary
compile time overhead and is annoying when investigating the
generated assembly.
This patch implements the vector types (i.e:`float2`) by making heavy
usage of templating. All vector functions are now outside of the vector
classes (inside the `blender::math` namespace) and are not vector size
dependent for the most part.
In the ongoing effort to make shaders less GL centric, we are aiming
to share more code between GLSL and C++ to avoid code duplication.
####Motivations:
- We are aiming to share UBO and SSBO structures between GLSL and C++.
This means we will use many of the existing vector types and others
we currently don't have (uintX, intX). All these variations were
asking for many more code duplication.
- Deduplicate existing code which is duplicated for each vector size.
- We also want to share small functions. Which means that vector
functions should be static and not in the class namespace.
- Reduce friction to use these types in new projects due to their
incompleteness.
- The current state of the `BLI_(float|double|mpq)(2|3|4).hh` is a
bit of a let down. Most clases are incomplete, out of sync with each
others with different codestyles, and some functions that should be
static are not (i.e: `float3::reflect()`).
####Upsides:
- Still support `.x, .y, .z, .w` for readability.
- Compact, readable and easilly extendable.
- All of the vector functions are available for all the vectors types
and can be restricted to certain types. Also template specialization
let us define exception for special class (like mpq).
- With optimization ON, the compiler unroll the loops and performance
is the same.
####Downsides:
- Might impact debugability. Though I would arge that the bugs are
rarelly caused by the vector class itself (since the operations are
quite trivial) but by the type conversions.
- Might impact compile time. I did not saw a significant impact since
the usage is not really widespread.
- Functions needs to be rewritten to support arbitrary vector length.
For instance, one can't call `len_squared_v3v3` in
`math::length_squared()` and call it a day.
- Type cast does not work with the template version of the `math::`
vector functions. Meaning you need to manually cast `float *` and
`(float *)[3]` to `float3` for the function calls.
i.e: `math::distance_squared(float3(nearest.co), positions[i]);`
- Some parts might loose in readability:
`float3::dot(v1.normalized(), v2.normalized())`
becoming
`math::dot(math::normalize(v1), math::normalize(v2))`
But I propose, when appropriate, to use
`using namespace blender::math;` on function local or file scope to
increase readability.
`dot(normalize(v1), normalize(v2))`
####Consideration:
- Include back `.length()` method. It is quite handy and is more C++
oriented.
- I considered the GLM library as a candidate for replacement. It felt
like too much for what we need and would be difficult to extend / modify
to our needs.
- I used Macros to reduce code in operators declaration and potential
copy paste bugs. This could reduce debugability and could be reverted.
- This touches `delaunay_2d.cc` and the intersection code. I would like
to know @howardt opinion on the matter.
- The `noexcept` on the copy constructor of `mpq(2|3)` is being removed.
But according to @JacquesLucke it is not a real problem for now.
I would like to give a huge thanks to @JacquesLucke who helped during this
and pushed me to reduce the duplication further.
Reviewed By: brecht, sergey, JacquesLucke
Differential Revision: https://developer.blender.org/D13791
This patch implements the vector types (i.e:`float2`) by making heavy
usage of templating. All vector functions are now outside of the vector
classes (inside the `blender::math` namespace) and are not vector size
dependent for the most part.
In the ongoing effort to make shaders less GL centric, we are aiming
to share more code between GLSL and C++ to avoid code duplication.
####Motivations:
- We are aiming to share UBO and SSBO structures between GLSL and C++.
This means we will use many of the existing vector types and others
we currently don't have (uintX, intX). All these variations were
asking for many more code duplication.
- Deduplicate existing code which is duplicated for each vector size.
- We also want to share small functions. Which means that vector
functions should be static and not in the class namespace.
- Reduce friction to use these types in new projects due to their
incompleteness.
- The current state of the `BLI_(float|double|mpq)(2|3|4).hh` is a
bit of a let down. Most clases are incomplete, out of sync with each
others with different codestyles, and some functions that should be
static are not (i.e: `float3::reflect()`).
####Upsides:
- Still support `.x, .y, .z, .w` for readability.
- Compact, readable and easilly extendable.
- All of the vector functions are available for all the vectors types
and can be restricted to certain types. Also template specialization
let us define exception for special class (like mpq).
- With optimization ON, the compiler unroll the loops and performance
is the same.
####Downsides:
- Might impact debugability. Though I would arge that the bugs are
rarelly caused by the vector class itself (since the operations are
quite trivial) but by the type conversions.
- Might impact compile time. I did not saw a significant impact since
the usage is not really widespread.
- Functions needs to be rewritten to support arbitrary vector length.
For instance, one can't call `len_squared_v3v3` in
`math::length_squared()` and call it a day.
- Type cast does not work with the template version of the `math::`
vector functions. Meaning you need to manually cast `float *` and
`(float *)[3]` to `float3` for the function calls.
i.e: `math::distance_squared(float3(nearest.co), positions[i]);`
- Some parts might loose in readability:
`float3::dot(v1.normalized(), v2.normalized())`
becoming
`math::dot(math::normalize(v1), math::normalize(v2))`
But I propose, when appropriate, to use
`using namespace blender::math;` on function local or file scope to
increase readability.
`dot(normalize(v1), normalize(v2))`
####Consideration:
- Include back `.length()` method. It is quite handy and is more C++
oriented.
- I considered the GLM library as a candidate for replacement. It felt
like too much for what we need and would be difficult to extend / modify
to our needs.
- I used Macros to reduce code in operators declaration and potential
copy paste bugs. This could reduce debugability and could be reverted.
- This touches `delaunay_2d.cc` and the intersection code. I would like
to know @howardt opinion on the matter.
- The `noexcept` on the copy constructor of `mpq(2|3)` is being removed.
But according to @JacquesLucke it is not a real problem for now.
I would like to give a huge thanks to @JacquesLucke who helped during this
and pushed me to reduce the duplication further.
Reviewed By: brecht, sergey, JacquesLucke
Differential Revision: https://developer.blender.org/D13791
This patch implements the vector types (i.e:float2) by making heavy
usage of templating. All vector functions are now outside of the vector
classes (inside the blender::math namespace) and are not vector size
dependent for the most part.
In the ongoing effort to make shaders less GL centric, we are aiming
to share more code between GLSL and C++ to avoid code duplication.
Motivations:
- We are aiming to share UBO and SSBO structures between GLSL and C++.
This means we will use many of the existing vector types and others we
currently don't have (uintX, intX). All these variations were asking
for many more code duplication.
- Deduplicate existing code which is duplicated for each vector size.
- We also want to share small functions. Which means that vector functions
should be static and not in the class namespace.
- Reduce friction to use these types in new projects due to their
incompleteness.
- The current state of the BLI_(float|double|mpq)(2|3|4).hh is a bit of a
let down. Most clases are incomplete, out of sync with each others with
different codestyles, and some functions that should be static are not
(i.e: float3::reflect()).
Upsides:
- Still support .x, .y, .z, .w for readability.
- Compact, readable and easilly extendable.
- All of the vector functions are available for all the vectors types and
can be restricted to certain types. Also template specialization let us
define exception for special class (like mpq).
- With optimization ON, the compiler unroll the loops and performance is
the same.
Downsides:
- Might impact debugability. Though I would arge that the bugs are rarelly
caused by the vector class itself (since the operations are quite trivial)
but by the type conversions.
- Might impact compile time. I did not saw a significant impact since the
usage is not really widespread.
- Functions needs to be rewritten to support arbitrary vector length. For
instance, one can't call len_squared_v3v3 in math::length_squared() and
call it a day.
- Type cast does not work with the template version of the math:: vector
functions. Meaning you need to manually cast float * and (float *)[3] to
float3 for the function calls.
i.e: math::distance_squared(float3(nearest.co), positions[i]);
- Some parts might loose in readability:
float3::dot(v1.normalized(), v2.normalized())
becoming
math::dot(math::normalize(v1), math::normalize(v2))
But I propose, when appropriate, to use
using namespace blender::math; on function local or file scope to
increase readability. dot(normalize(v1), normalize(v2))
Consideration:
- Include back .length() method. It is quite handy and is more C++
oriented.
- I considered the GLM library as a candidate for replacement.
It felt like too much for what we need and would be difficult to
extend / modify to our needs.
- I used Macros to reduce code in operators declaration and potential
copy paste bugs. This could reduce debugability and could be reverted.
- This touches delaunay_2d.cc and the intersection code. I would like to
know @Howard Trickey (howardt) opinion on the matter.
- The noexcept on the copy constructor of mpq(2|3) is being removed.
But according to @Jacques Lucke (JacquesLucke) it is not a real problem
for now.
I would like to give a huge thanks to @Jacques Lucke (JacquesLucke) who
helped during this and pushed me to reduce the duplication further.
Reviewed By: brecht, sergey, JacquesLucke
Differential Revision: http://developer.blender.org/D13791
It is common to have fields that contain a constant value. Before this
commit, such constants were represented by operation nodes which
don't have inputs. Having a special node type for constants makes
working with them a bit cheaper.
It also allows skipping some unnecessary processing when evaluating
fields, because constant fields can be detected more easily.
This commit also generalizes the concept of field node types a bit.
We often had to use two `FieldEvaluator` instances to first evaluate
the selection and then the remaining fields. Now both can be done
with a single `FieldEvaluator`. This results in less boilerplate code in
many cases.
Performance is not affected by this change. In a separate patch we
could improve performance by reusing evaluated sub-fields that are
used by the selection and the other fields.
Differential Revision: https://developer.blender.org/D13571
This implements an optimization pass for multi-function procedures.
It optimizes memory reuse by moving destruct instructions up.
For more details see the in-code comment.
In very large fields with many short lived intermediate values, this change
can improve performance 3-4x. Furthermore, in such cases, peak memory
consumption is reduced significantly (e.g. 100x lower peak memory usage).
Differential Revision: https://developer.blender.org/D13548