When `BLI_task_parallel_mempool` does not use threading, the
`userdata_chunk` is allocated locally simulating a TLS.
However `func_reduce` is not called so the original chunk is ignored.
`task_parallel_iterator_no_threads` is another function that doesn't call
`func_reduce`. It also ignores `userdata_chunk_local` in the main iterator.
The solution in these cases is not to create a `userdata_chunk_local`.
This fixes T90131
Differential Revision: https://developer.blender.org/D12067
After looking into task isolation issues with Sergey, we couldn't find the
reason behind the deadlocks that we are getting in T87938 and a Sprite Fright
file involving motion blur renders.
There is no apparent place where we adding or waiting on tasks in a task group
from different isolation regions, which is what is known to cause problems. Yet
it still hangs. Either we do not understand some limitation of TBB isolation,
or there is a bug in TBB, but we could not figure it out.
Instead the idea is to use isolation only where we know we need it: when
holding a mutex lock and then doing some multithreaded operation within that
locked region. Three places where we do this now:
* Generated images
* Cached BVH tree building
* OpenVDB lazy grid loading
Compared to the more automatic approach previously used, there is the downside
that it is easy to miss places where we need isolation. Yet doing it more
automatically is also causing unexpected issue and bugs that we found no
solution for, so this seems better.
Patch implemented by Sergey and me.
Differential Revision: https://developer.blender.org/D11603
Splitting out thread safe iteration logic means regular iteration
isn't checking for the thread-safe pointer each step.
This gives a small but measurable overall performance gain of 2-3%
when redrawing a high-poly mesh.
Ref D11564
Reviewed By: mont29
Support thread local storage for BLI_task_parallel_mempool,
as well as support for the reduce and free callbacks.
mempool_iter_threadsafe_* functions have been moved into a private
header thats only shared between task_iterator.c and BLI_mempool.c
so the TLS can be made part of the iterator array without having to
rely on passing in struct offsets.
Add test task.MempoolIterTLS that ensures reduce and free
are working as expected.
Reviewed By: mont29
Ref D11548
A single threaded task with thread data over 8192 bytes would leak.
While this didn't happen in practice, it could cause issues in the
future.
The free call for `task_parallel_iterator_do` wasn't running
if callbacks weren't set, also not an issue in practice but avoids
potential problems in the future too.
Under some circumstances using task isolation can cause deadlocks.
Previously, our task pool implementation would run all tasks in an
isolated region. Now using task isolation is optional and can be
turned on/off for individual task pools.
Task pools that spawn new tasks recursively should never enable
task isolation. There is a new check that finds these cases at runtime.
Right now this check is disabled, so that this commit is a pure refactor.
It will be enabled in an upcoming commit.
This fixes T88598.
Differential Revision: https://developer.blender.org/D11415
This patch enables TBB as the default task scheduler. TBB stands for Threading Building Blocks and is developed by Intel. The library contains several threading patters. This patch maps blenders BLI_task_* function to their counterpart. After this patch we can add more patterns. A promising one is TBB:graph that can be used for depsgraph, draw manager and compositor.
Performance changes depends on the actual hardware. It was tested on different hardwares from laptops to workstations and we didn't detected any downgrade of the performance.
* Linux Xeon E5-2699 v4 got FPS boost from 12 to 17 using Spring's 04_010_A.anim.blend.
* AMD Ryzen Threadripper 2990WX 32-Core Animation playback goes from 9.5-10.5 FPS to 13.0-14.0 FPS on Agent 327 , 10_03_B.anim.blend.
Reviewed By: brecht, sergey
Differential Revision: https://developer.blender.org/D7475
This is not currently used and will take some work to support with TBB, so
remove it until we have a new implementation based on TBB.
Fixes T76005, parallel range pool tests failing.
Ref D7475
In preparation of TBB we need to split the finalize function into reduce
and free. Reduce is used to combine results and free for freeing any
allocated memory.
The reduce function is called to join user data chunk into another, to reduce the
result to the original userdata_chunk memory. These functions should have no side
effects so that they can be run on any thread.
The free functions should free data created during execution (TaskParallelRangeFunc).
Original patch by Brecht van Lommel
{rB61f49db843cf5095203112226ae386f301be1e1a}.
Reviewed By: Brecht van Lommel, Bastien Montagne
Differential Revision: https://developer.blender.org/D7394
Tasks: move priority from task to task pool {rBf7c18df4f599fe39ffc914e645e504fcdbee8636}
Tasks: split task.c into task_pool.cc and task_iterator.c {rB4ada1d267749931ca934a74b14a82479bcaa92e0}
Differential Revision: https://developer.blender.org/D7385