Allocate statistics array dynamically, so increasing max number of threads does
not increase sloppyness of the memory usage.
For the further cleanups: we can try alloca-ing this array, but it's also not
really safe because we can have quite huge number of threads in the future.
Plus statistics will allocate memory for each individual entry, so using alloca
is not going to give anything beneficial here.
This ensures proper values of currently running tasks in the pool
(previously difference between mutex locks when acquiring new job
and releasing it might in theory give wrong values).
This way we can have scheduler capable of scheduling tasks on all the CPUs
but in the same time we can limit tasks like baking (in the future) to use
no more than given number of threads.
It now supports different scheduling schemas: dynamic and static.
Static one is the default and it splits work into equal number of
range iterations.
Dynamic one allocates chunks of 32 iterations which then being
dynamically send to a thread which is currently idling.
This gives slightly better performance. Still some tricks are
possible to have. For example we can use some smarter static scheduling
when one thread might steal tasks from another threads when it runs
out of work to be done.
Also removed unneeded spin lock in the mesh deform evaluation,
on the first glance it seemed to be a reduction involved here but
int fact threads are just adding value to the original vertex
coordinates. No write access to the same element of vertexCos
happens from separate threads.
This commit switches meshdeform modifier to use threads to evaluate
the vertices positions using the central task scheduler.
SO now we've got an utility function to help splitting the for loop
into tasks using BLI_task module which is pretty straightforward to
use: it gets range (which is an integer lower and higher bounds) and
the function and userdata to be invoked for each of the iterations.
The only weak point for now is the passing the data to the callback,
this isn't so trivial to improve in pure C.
Reviewers: campbellbarton
Differential Revision: https://developer.blender.org/D838
Replaces ThreadedWorker and is gonna to be used
for threaded object update in the future and
some more upcoming changes.
But in general, it's to be used for any task
based subsystem in Blender.
Originally written by Brecht, with some fixes
and tweaks by self.