Commit Graph

50 Commits

Author SHA1 Message Date
b498db06eb Task scheduler: Cleanup, use BLI_assert() instead of assert() 2017-03-06 11:33:27 +01:00
2e8398c095 Get rid of BLI_task_pool_stop().
Comments said that function was supposed to 'stop worker threads', but
it absolutely did not do anything like that, was merely wiping out TODO
queue of tasks from given pool (kind of subset of what
`BLI_task_pool_cancel()` does).

Misleading, and currently useless, we can always add it back if we need
it some day, but for now we try to simplify that area.
2017-03-03 17:16:39 +01:00
18c2a44333 Fix ugly mistake in BLI_task - freeing while some tasks are still being processed.
Freeing pool was calling `BLI_task_pool_stop()`, which only clears
pool's tasks that are in TODO queue, whithout ensuring no more tasks
from that pool are being processed in worker threads.

This could lead to use-after-free random (and seldom) crashes.

Now use instead `BLI_task_pool_cancel()`, which does waits for all tasks
being processed to finish, before returning.
2017-03-03 17:12:03 +01:00
17cf423f30 Cleanup: Indentation 2017-03-03 15:53:55 +01:00
7fcae7ba60 Task scheduler: Remove query for the pool's number of threads
Not really happy of per-pool threads limit, need to find better
approach to that. But at least it's possible to get rid of half
of the nastyness here by removing getter which was only used in
an assert statement.

That piece of code was already well-tested and this code becomes
obsolete in the new depsgraph and does no longer exists in blender
2.8 branch.
2017-03-01 18:00:54 +01:00
f0cf15b5c6 Task scheduler: Remove counter of done tasks
This was only used for progress report, and it's wrong because:

- Pool might in theory be re-used by different tasks
- We should not make any decision based on scheduling stats

Proper way is to take care of progress by the task itself.
2017-03-01 12:45:51 +01:00
4ee08e9533 Atomics: Make naming more obvious about which value is being returned 2016-11-15 12:16:26 +01:00
0fe7446a30 BLI_task: fix case were some pool could work in more threads than allowed.
We were checking for number of tasks from given pool already active, and
then atomically increasing it if allowed - this is not correct, number
could be increased by another thread between check and atomic op!

Atomic primitives are nice, but you must be very careful with *how* you
use them... Now we atomically increase counter, check result, and if we
end up over max value, abort and decrease counter again.

Spotted by Sergey, thanks!
2016-10-08 14:51:33 +02:00
a31eca3fdd Fix T49251: moving smoke domain with additional resolution causes crash.
This is a bug in the multithreaded task manager in negative value range.

The problem here is that if previter is unsigned, the comparison in the
return statement is unsigned, and works incorrectly if stop < 0 &&
iter >= 0. This in turn can happen if stop is close to 0, because this
code is designed to overrun the stop by chunk_size*num_threads as
the threads terminate.

This probably should go into 2.78 as it prevents a crash.
2016-09-05 15:50:12 +03:00
0971749f4c Cleanup: spelling, indentation 2016-06-29 20:37:54 +10:00
2465bd90d5 Cleanup: style, whitespace, doxy filepaths 2016-06-19 06:33:29 +10:00
22ff9c5568 Fix T48497: Stupid typo in recent own BLI_task forloop work that broke non-parallelized case. 2016-05-22 18:35:44 +02:00
688858d3a8 BLI_task: Add new 'BLI_task_parallel_range_finalize()'.
Together with the extended loop callback and userdata_chunk, this allows to perform
cumulative tasks (like aggregation) in a lockfree way using local userdata_chunk to store temp data,
and once all workers have finished, to merge those userdata_chunks in the finalize callback
(from calling thread, so no need to lock here either).

Note that this changes how userdata_chunk is handled (now fully from 'main' thread,
which means a given worker thread will always get the same userdata_chunk, without
being re-initialized anymore to init value at start of each iter chunk).
2016-05-16 17:15:18 +02:00
5a7429c363 BLI_task: Add back lost 'push_from_thread' change to BLI_task_parallel_range() & co. 2016-05-16 17:00:15 +02:00
575d7a9666 BLI_task: make foreach loop index hleper lockfree, take II.
New code is actually much, much better than first version, using 'fetch_and_add' atomic op
here allows us to get rid of the loop etc.

The broken CAS issue remains on windows, to be investigated...
2016-05-16 15:57:19 +02:00
bb7da630ba Fix T48422: Revert "BLI_task: nano-optimizations to BLI_task_parallel_range feature."
There are some serious issues under windows, causing deadlocks somehow (not reproducible under linux so far).

Until further investigation over why this happens, better to revert to previous
spin-locked behavior.

This reverts commits a83bc4f597 and 98123ae916.
2016-05-15 21:14:40 +02:00
a83bc4f597 Fix an error in new lockfree parallel_range_next_iter_get() helper.
Reading the shared state->iter value after storing it in the 'reference' var could in theory
lead to a race condition setting state->iter value above state->stop, which would be 'deadly'.

This **may** be the cause of T48422, though I was not able to reproduce that issue so far.
2016-05-14 18:06:05 +02:00
868cfc5a4a BLI_task: add support for listbase parallelized for loops.
Code by @sergey, with small edits and doc by @mont29.
2016-05-13 12:06:15 +02:00
98123ae916 BLI_task: nano-optimizations to BLI_task_parallel_range feature.
This commit makes use of new taskpool feature (instead of allocating own tasks),
and removes the spinlock used to generate chunks (using atomic ops instead).

In best cases (dynamic scheduled loop with light processing func callback), we
get a few percents of speedup, in most cases there is no sensible enhancement.
2016-05-10 17:57:53 +02:00
335274192e Revert "Task scheduler: Avoid mutex lock in number manipulation functions"
Appears mutex was guarateeing number of tasks is not modified at moments
when it's not expected. Removing those mutexes resulted in some hard-to-catch
locks where worker thread were waiting for work by all the tasks were already
done.

This reverts commit a1d8fe052c.
2016-05-10 15:43:03 +02:00
a1d8fe052c Task scheduler: Avoid mutex lock in number manipulation functions
It seems using atomic operations here we can avoid having mute without
breaking anything.

Thanks Bastien for double-checking the changes!
2016-05-10 14:59:19 +02:00
fcc2175710 Fix own mistake in rBd617de965ea20e5d5 from late December 2015.
Brain melt here, intention was to reduce number of tasks in case we have not much chunks of data to loop over,
not to increase it!

Note that this only affected dynamic scheduling.
2016-05-10 13:10:21 +02:00
7efa34d078 Task scheduler: Add thread-aware task push routines
This commit implements new function BLI_task_pool_push_from_thread()
who's main goal is to have less parasitic load on the CPU bu avoiding
memory allocations as much as possible, making taks pushing cheaper.

This function expects thread ID, which must be 0 for the thread from
which pool is created from (and from which wait_work() is called) and
for other threads it mush be the ID which was sent to the thread working
function.

This reduces allocations quite a bit in the new dependency graph,
hopefully gaining some visible speedup on a fewzillion core machines
(on my own machine can only see benefit in profiler, which shows
significant reduce of time wasted in the memory allocation).
2016-05-10 10:01:24 +02:00
9ac35be63a Task scheduler: Don't calloc in performance-critical areas
Majority of the fields are being overwritten anyway, so calloc it
kinda waste of CPU ticks.
2016-05-09 14:54:24 +02:00
d5ddc52ae1 Cleanup: style 2016-01-19 04:54:39 +11:00
8acf14c55c Cleanup: BLI_task foreach looper API doc. 2016-01-16 16:06:27 +01:00
31d907fa0a Cleanup: BLI_task - API changes.
Based on usages so far:
- Split callback worker func in two, 'basic' and 'extended' versions. The former goes back
  to the simplest verion, while the later keeps the 'userdata_chunk', and gets the thread_id too.
- Add use_threading to simple BLI_task_parallel_range(), turns out we need this pretty much systematically,
  and allows to get rid of most usages of BLI_task_parallel_range_ex().
- Now BLI_task_parallel_range() expects 'basic' version of callback, while BLI_task_parallel_range_ex()
  expectes 'extended' version of the callback.

All in all, this should make common usage of BLI_task_parallel_range simpler (less verbose), and add
access to advanced callback to thread id, which is mandatory in some (future) cases.
2016-01-16 15:59:37 +01:00
82d88e42a5 BLI_task threaded looper: do not assert when start == stop.
This can happen quite often in forloops, and would be annoying to have to check for this
in caller code! So now, just return without doing anything in this case.
2016-01-04 19:17:09 +01:00
511e3c5d9d BLI_task: change BLI_task_parallel_range_ex() to just take a bool whether to use threading or not, instead of threshold.
From recent experience, turns out we often do want to use something else than basic
range of parallelized forloop as control parameter over threads usage, so now BLI func
only takes a boolean, and caller defines best check for its own case.
2015-12-30 20:39:56 +01:00
d617de965e Fix (unreported) broken BLI_task's forloop func in case we have less iterations that workers.
When called with very small range, `BLI_task_parallel_range_ex()` would generate a zero `chunk_size`,
leading to some infinite looping in `parallel_range_func` due to `parallel_range_next_iter_get` returning
true without actually increasing the counter!

So now, we ensure `chunk_size` and `num_tasks` are always at least 1 (and avoid generating too much tasks too).
2015-12-28 00:37:07 +01:00
dc98a3b0a7 Cleanup: style/spelling 2015-12-12 15:10:03 +11:00
0f609d5d04 BLI_task: BLI_task_parallel_range_ex: add some per-chunk userdata.
This mimics OpenMP's 'firstprivate' feature. It is sometimes handy to have some persistent local data during a whole chunk.

Reviewers: sergey

Reviewed By: sergey

Subscribers: campbellbarton

Differential Revision: https://developer.blender.org/D1635
2015-11-25 11:01:59 +01:00
7a09d15ade Cleanup: comments/style 2015-11-06 05:34:05 +11:00
d0d523d809 Better fix for pthread ID comparison crap on windows.
Suggested by Sergey, thanks!
2015-11-02 19:25:00 +01:00
d2fbbed462 Attempt to fix win32 compilation after own recent commits. 2015-11-02 18:48:14 +01:00
04ac8768ef BLI_task: add support for full-background taskpools.
With current code, in single-threaded context, a pool of task may never be executed
until one calls BLI_task_pool_work_and_wait() on it, this is not acceptable for
asynchronous tasks where you never want to actually lock the main thread.

This commits adds an extra thread in single-threaded case, and a new 'type' of pool,
such that one can create real background pools of tasks. See code for details.

Review: D1565
2015-11-02 16:57:48 +01:00
44774f8160 BLI_task: add freedata callback to tasks.
Useful in case one needs more complex handling of tasks data than a mere MEM_freeN().
2015-11-02 16:52:19 +01:00
952afbf916 BLI_task: Fix/enhance logic of exiting worker threads.
In previous code, worker would exit in case it gets awoken from a condition_wait() and
task queue is empty. However, there may be spurious wake up (either due to pthread itself,
or to some race condition between workers) that would lead to wrongly exiting a worker before
we actually exit the whole scheduler. See code for more details.
2015-11-02 16:42:39 +01:00
f56392f224 BLI_task: fix bad freeing of current task_thread in case POSIX thread creation fails.
Trying to MEM_free a single item of a whole MEM_calloc'ated array, tsst...
Luckily looks like POSIX thread creation does not fail often! :P
2015-10-18 14:39:37 +02:00
40091ff83a Correction to early output in the parallel range implementation
The used heuristic of checking the value prior to lock is not totally safe
because assignment is not atomic and check might not give proper result.
2015-05-18 16:40:51 +05:00
97ae0f22cd Depsgraph debug: Remove hardcoded array of BLENDER_MAX_THREADS elements
Allocate statistics array dynamically, so increasing max number of threads does
not increase sloppyness of the memory usage.

For the further cleanups: we can try alloca-ing this array, but it's also not
really safe because we can have quite huge number of threads in the future.
Plus statistics will allocate memory for each individual entry, so using alloca
is not going to give anything beneficial here.
2015-04-13 14:41:02 +05:00
e177c51430 Use atomic operations in task pool
This ensures proper values of currently running tasks in the pool
(previously difference between mutex locks when acquiring new job
and releasing it might in theory give wrong values).
2014-12-02 15:23:58 +05:00
7b0c529fe2 Task scheduler: Add an option to limit number of threads per pool
This way we can have scheduler capable of scheduling tasks on all the CPUs
but in the same time we can limit tasks like baking (in the future) to use
no more than given number of threads.
2014-11-21 11:31:30 +01:00
e43b74d87a Optimization of parallel range
It now supports different scheduling schemas: dynamic and static.
Static one is the default and it splits work into equal number of
range iterations.

Dynamic one allocates chunks of 32 iterations which then being
dynamically send to a thread which is currently idling.

This gives slightly better performance. Still some tricks are
possible to have. For example we can use some smarter static scheduling
when one thread might steal tasks from another threads when it runs
out of work to be done.

Also removed unneeded spin lock in the mesh deform evaluation,
on the first glance it seemed to be a reduction involved here but
int fact threads are just adding value to the original vertex
coordinates. No write access to the same element of  vertexCos
happens from separate threads.
2014-11-03 22:44:29 +05:00
eaaeae4699 Cleanup: spelling 2014-10-23 10:38:38 +02:00
dfc4de036e Meshdeform modifier: Use threaded evaluation
This commit switches meshdeform modifier to use threads to evaluate
the vertices positions using the central task scheduler.

SO now we've got an utility function to help splitting the for loop
into tasks using BLI_task module which is pretty straightforward to
use: it gets range (which is an integer lower and higher bounds) and
the function and userdata to be invoked for each of the iterations.

The only weak point for now is the passing the data to the callback,
this isn't so trivial to improve in pure C.

Reviewers: campbellbarton

Differential Revision: https://developer.blender.org/D838
2014-10-22 12:20:41 +02:00
b3afbcab8f ListBase API: add utility api funcs for clearing and checking empty 2014-02-08 06:24:05 +11:00
621bf47e91 Docs: doxygen file descriptions for BLF, GPU and WM 2014-01-19 23:15:25 +11:00
48c1e0c0fc spelling: use American spelling for canceled 2013-10-26 01:06:19 +00:00
f0dcff9aa9 Task scheduler ported form CYcles to C
Replaces ThreadedWorker and is gonna to be used
for threaded object update in the future and
some more upcoming changes.

But in general, it's to be used for any task
based subsystem in Blender.

Originally written by Brecht, with some fixes
and tweaks by self.
2013-10-12 14:08:59 +00:00