18 Commits

Author SHA1 Message Date
3f1886d0b7 Functions: align chunk sizes in multi-function evaluation
This can improve performance in some circumstances when there are
vectorized and/or unrolled loops. I especially noticed that this helps
a lot while working on D16970 (got a 10-20% speedup there by avoiding
running into the non-vectorized fallback loop too often).
2023-01-22 00:03:25 +01:00
85908e9edf Geometry Nodes: new Interpolate Curves node
This adds a new `Interpolate Curves` node. It allows generating new curves
between a set of existing guide curves. This is essential for procedural hair.

Usage:
- One has to provide a set of guide curves and a set of root positions for
  the generated curves. New curves are created starting from these root
  positions. The N closest guide curves are used for the interpolation.
- An additional up vector can be provided for every guide curve and
  root position. This is typically a surface normal or nothing. This allows
  generating child curves that are properly oriented based on the
  surface orientation.
- Sometimes a point should only be interpolated using a subset of the
  guides. This can be achieved using the `Guide Group ID` and
  `Point Group ID` inputs. The curve generated at a specific point will
  only take the guides with the same id into account. This allows e.g.
  for hair parting.
- The `Max Neighbors` input limits how many guide curves are taken
  into account for every interpolated curve.

Differential Revision: https://developer.blender.org/D16642
2023-01-20 12:09:38 +01:00
c29c61f840 Fix T102292: deadlock in geometry nodes evaluation with task isolation
As described in the comment on `BLI_task_isolate`, deadlocks can happen
when isolation is used with threading primitives that separate spawning tasks
from executing them. All threads are waiting the tasks to complete but no
thread is able to continue working due to task isolation.

The fix is to not pass lazy-threading hints through task isolations. This way
isolated regions can't create new tasks in a scheduler further up the call stack.
This may lead to minor slowdowns because less threading may be used.
It's generally possible to get rid of the slowdown again by sending the
lazy-threading hint before entering the isolated region.
2022-11-06 15:07:32 +01:00
5c81d3bd46 Geometry Nodes: improve evaluator with lazy threading
In large node setup the threading overhead was sometimes very significant.
That's especially true when most nodes do very little work.

This commit improves the scheduling by not using multi-threading in many
cases unless it's likely that it will be worth it. For more details see the comments
in `BLI_lazy_threading.hh`.

Differential Revision: https://developer.blender.org/D15976
2022-09-20 11:08:05 +02:00
Iliay Katueshenock
c94ca54cda BLI: add use_threading parameter to parallel_invoke
`parallel_invoke` allows executing functions on separate threads.
However, creating tasks in tbb has a measurable amount of overhead.
Therefore, it can be benefitial to disable parallelization when
the amount of work done per function is small.

See D15539 for some benchmark results.

Differential Revision: https://developer.blender.org/D15539
2022-07-26 11:10:16 +02:00
c2737913db BLI: Avoid invoking tbb for small parallel_reduce calls
Apply a change similar to e130903060 for
`parallel_reduce`, just like `parallel_for`. I measured a performance
improvement in viewport FPS of at least 10% with 1 million small
instances (one bottleneck was computing many small bounding boxes).
2022-05-09 18:21:50 +02:00
c434782e3a File headers: SPDX License migration
Use a shorter/simpler license convention, stops the header taking so
much space.

Follow the SPDX license specification: https://spdx.org/licenses

- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile

While most of the source tree has been included

- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
  use different header conventions.

doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.

See P2788 for the script that automated these edits.

Reviewed By: brecht, mont29, sergey

Ref D14069
2022-02-11 09:14:36 +11:00
7c10e364b2 BLI: wrap parallel_invoke from tbb 2022-02-09 13:08:04 +01:00
e130903060 BLI: avoid invoking tbb for small workloads
We often call `parallel_for` in places with very variable
sized workloads. When many elements are processed,
using multi-threading is great, but when processing
few elements (possibly many times) using `parallel_for`
can result in significant overhead.

I measured that this improves performance by >20% in
the refactored realize instances code I'm working on
separately. The change might also help with debugging
sometimes, because the stack trace is smaller and contains
fewer irrevelant symbols.
2021-12-02 12:56:47 +01:00
3c3669894f Cleanup: use system includes 2021-10-04 13:14:58 +11:00
ceff86aafe Various Exact Boolean parallelizations and optimizations.
From patch D11780 from Erik Abrahamsson.
It parallelizes making the vertices, destruction of map entries,
finding if the result is PWN, finding triangle adjacencies,
and finding the ambient cell.
The latter needs a parallel_reduce from tbb, so added one into
BLI_task.hh so that if WITH_TBB is false, the code will still work.

On Erik's 6-core machine, the elapsed time went from 17.5s to 11.8s
(33% faster) on an intersection of two spheres with 3.1M faces.
On Howard's 24-core machine, the elapsed time went from 18.7s to 10.8s
for the same test.
2021-07-05 18:09:36 -04:00
b37093de7b BLI: add C++ wrapper for task isolation
This makes it easier to use task isolation in c++ code.
Previously, one either had to check `WITH_TBB` (possibly indirectly
through `WITH_OPENVDB`) or one had to use the C function which
is less convenient.
2021-06-16 16:29:21 +02:00
45d59e0df5 BLI: add threading namespace
This namespace groups threading related functions/classes. This avoids
adding more threading related stuff to the blender namespace. Also it
makes naming a bit easier, e.g. the c++ version of BLI_task_isolate could
become blender::threading::isolate_task or something similar.

Differential Revision: https://developer.blender.org/D11624
2021-06-16 16:14:02 +02:00
Brecht Van Lommel
677e63d518 TBB: fix deprecation warnings with newer TBB versions
* USD and OpenVDB headers use deprecated TBB headers, suppress all deprecation
  warnings there since we have no control over them.
* For our own TBB includes, use the individual headers rather than the tbb.h that
  includes everything to avoid warnings, rather than suppressing all.

This is in anticipation of the TBB 2020 upgrade in D10359. Ref D10361.
2021-02-10 19:32:24 +01:00
626a79204e MSVC: Fix build warning
If a define of NOMINMAX was made before BLI_task.hh was included,
the compiler would emit a

warning C4005: 'NOMINMAX': macro redefinition

warning, to work around this only define it if it is not already
defined, and only undefine it if we were the ones that made the
define earlier.
2020-11-10 08:48:18 -07:00
fa0ceb4959 Cleanup: spelling 2020-10-16 11:46:48 +11:00
00f7b572d9 Windows: Fix build issue on windows
TBB includes Windows.h which defines a min/max macro
leading to issues when you want to use std::min and
std::max.

This change prevents Windows.h from defining them
sidestepping the issue.
2020-10-15 17:14:57 -06:00
309c919ee9 BKE: parallelize BKE_mesh_calc_edges
`BKE_mesh_calc_edges` was the main performance bottleneck in D9141.
While openvdb only needed ~115ms, calculating the edges afterwards
took ~960ms. Now with some parallelization this is reduced to ~210ms.

Parallelizing `BKE_mesh_calc_edges` is not entirely trivial, because it
has to perform deduplication and some other things that have to happen
in a certain order. Even though the multithreading improves performance
with more threads, there are diminishing returns when too many threads
are used in this function.

The speedup is mainly achieved by having multiple hash tables that are
filled in parallel. The distribution of the edges to hash tables is based on
a hash (that is different from the hash used in the actual hash tables).

I moved the function to C++, because that made it easier for me to
optimize it. Furthermore, I added `BLI_task.hh` which contains some
light tbb wrappers for parallelization.

Reviewers: campbellbarton

Differential Revision: https://developer.blender.org/D9151
2020-10-09 11:56:12 +02:00