From recent experience, turns out we often do want to use something else than basic
range of parallelized forloop as control parameter over threads usage, so now BLI func
only takes a boolean, and caller defines best check for its own case.
This mimics OpenMP's 'firstprivate' feature. It is sometimes handy to have some persistent local data during a whole chunk.
Reviewers: sergey
Reviewed By: sergey
Subscribers: campbellbarton
Differential Revision: https://developer.blender.org/D1635
With current code, in single-threaded context, a pool of task may never be executed
until one calls BLI_task_pool_work_and_wait() on it, this is not acceptable for
asynchronous tasks where you never want to actually lock the main thread.
This commits adds an extra thread in single-threaded case, and a new 'type' of pool,
such that one can create real background pools of tasks. See code for details.
Review: D1565
Allocate statistics array dynamically, so increasing max number of threads does
not increase sloppyness of the memory usage.
For the further cleanups: we can try alloca-ing this array, but it's also not
really safe because we can have quite huge number of threads in the future.
Plus statistics will allocate memory for each individual entry, so using alloca
is not going to give anything beneficial here.
This way we can have scheduler capable of scheduling tasks on all the CPUs
but in the same time we can limit tasks like baking (in the future) to use
no more than given number of threads.
It now supports different scheduling schemas: dynamic and static.
Static one is the default and it splits work into equal number of
range iterations.
Dynamic one allocates chunks of 32 iterations which then being
dynamically send to a thread which is currently idling.
This gives slightly better performance. Still some tricks are
possible to have. For example we can use some smarter static scheduling
when one thread might steal tasks from another threads when it runs
out of work to be done.
Also removed unneeded spin lock in the mesh deform evaluation,
on the first glance it seemed to be a reduction involved here but
int fact threads are just adding value to the original vertex
coordinates. No write access to the same element of vertexCos
happens from separate threads.
This commit switches meshdeform modifier to use threads to evaluate
the vertices positions using the central task scheduler.
SO now we've got an utility function to help splitting the for loop
into tasks using BLI_task module which is pretty straightforward to
use: it gets range (which is an integer lower and higher bounds) and
the function and userdata to be invoked for each of the iterations.
The only weak point for now is the passing the data to the callback,
this isn't so trivial to improve in pure C.
Reviewers: campbellbarton
Differential Revision: https://developer.blender.org/D838
Replaces ThreadedWorker and is gonna to be used
for threaded object update in the future and
some more upcoming changes.
But in general, it's to be used for any task
based subsystem in Blender.
Originally written by Brecht, with some fixes
and tweaks by self.