Commit Graph

35 Commits

Author SHA1 Message Date
de13d0a80c doxygen: add newline after \file
While \file doesn't need an argument, it can't have another doxy
command after it.
2019-02-18 08:22:12 +11:00
eef4077f18 Cleanup: remove redundant doxygen \file argument
Move \ingroup onto same line to be more compact and
make it clear the file is in the group.
2019-02-06 15:45:22 +11:00
65ec7ec524 Cleanup: remove redundant, invalid info from headers
BF-admins agree to remove header information that isn't useful,
to reduce noise.

- BEGIN/END license blocks

  Developers should add non license comments as separate comment blocks.
  No need for separator text.

- Contributors

  This is often invalid, outdated or misleading
  especially when splitting files.

  It's more useful to git-blame to find out who has developed the code.

See P901 for script to perform these edits.
2019-02-02 01:36:28 +11:00
88a80fcec8 Cleanup: commas at the end of enums
Without this clang-format may wrap them onto a single line.
2019-01-16 00:03:03 +11:00
01581d4a1e BLI_task: fix queue in work_and_wait, and support resetting.
To make the pool more usable for running multiple stages of tasks,
fix local queue handling in BLI_task_pool_work_and_wait.

Specifically, after the wait loop the local queue should be empty,
or the wait part of the function contract isn't fulfilled. Instead,
check and run any tasks in queue before the wait loop.

Also, add a new function that resets the suspended state of the pool.
2018-12-04 14:08:50 +03:00
af36dd4664 Cleanup: trailing newlines 2018-06-29 08:02:49 +02:00
75fc1c3507 Cleanup: trailing whitespace (comment blocks)
Strip unindented comment blocks - mainly headers to avoid conflicts.
2018-06-01 18:19:39 +02:00
a7dc5e12ac Cleanup: style 2018-01-21 11:41:52 +11:00
5614193745 Task scheduler: Use restrict pointer qualifier
Those pointers are never to be aliased, so let's be explicit about this and hope
compiler does save some CPU ticks.
2018-01-10 12:49:51 +01:00
932d448ae0 Task scheduler: Use const qualifiers in parallel range 2018-01-09 16:09:33 +01:00
c4e42d70a4 Task scheduler: Add minimum number of iterations per thread in parallel range
The idea is to support following: allow doing parallel for on a small range,
each iteration of which takes lots of compute power, but limit such range to
a subset of threads.

For example, on a machine with 44 threads we can occupy 4 threads to handle
range of 64 elements, 16 elements per thread, where each block of 16 elements
is very complex to compute.

The idea should be to use this setting instead of global use_threading flag,
which is only based on size of array. Proper use of the new flag will improve
threadability.

This commit only contains internal task scheduler changes, this setting is not
used yet by any areas.
2018-01-09 16:09:33 +01:00
4c4a7e84c6 Task scheduler: Use single parallel range function with more flexible function
Now all the fine-tuning is happening using parallel range settings structure,
which avoid passing long lists of arguments, allows extend fine-tuning further,
avoid having lots of various functions which basically does the same thing.
2018-01-09 16:09:33 +01:00
d2708b0f73 Task scheduler: Get rid of extended version of parallel range callback
Wrap all arguments into TLS type of argument. Avoids some branching and also
makes it easier to extend things in the future.
2018-01-09 16:09:33 +01:00
efb86b712d Add a new parallel looper for MemPool items to BLI_task.
It merely uses the new thread-safe iterators system of mempool, quite
straight forward.

Note that to avoid possible confusion with two void pointers as
parameters of the callback, a dummy opaque struct pointer is used
instead for the second parameter (pointer generated by iteration over
mempool), callback functions must explicitely convert it to expected
real type.

Also added a basic gtest for this new feature.
2017-11-23 21:14:43 +01:00
a481908232 Task scheduler: Optimize subsequent pushing bunch of tasks
The idea is to accumulate all new tasks in a thread local queue
first without doing any thread synchronization (aka, locks and
conditional variables) and move those tasks to a scheduler queue
once they are all ready. This way we avoid per-task-pool lock
and only have one lock per bunch of tasks.

This is particularly handy when scheduling new dependency graph
node children. Brings FPS of cached simulation from the linked
below file from ~30 to ~50.

See documentation for BLI_task_pool_delayed_push_{begin, end}
and for TaskThreadLocalStorage::do_delayed_push.

Fixes T50027: Rigidbody playback and simulation performance regression with new depsgraph

Thanks Bastien for the review!
2017-05-31 15:44:08 +02:00
9e566b06e3 Task scheduler: Add concept of suspended pools
Suspended pools allows to push huge amount of initial tasks
without any threading synchronization and hence overhead.

This gives ~50% speedup of cached rigid body with file from
T50027 and seems to have no negative affect in other scenes
here.
2017-03-07 17:32:01 +01:00
9522f8acf0 Task scheduler: Remove per-pool threads limit
This feature was adding extra complexity to task scheduling
which required yet extra variables to be worried about to be
modified in atomic manner, which resulted in following issues:

- More complex code to maintain, which increases risks of
  something going wrong when we modify the code.

- Extra barriers and/or locks during task scheduling, which
  causes extra threading overhead.

- Unable to use some other implementation (such as TBB) even for
  the comparison tests.

Notes about other changes.

There are two places where we really had to use that limit.

One of them is the single threaded dependency graph. This will
now construct a single-threaded scheduler at evaluation time.
This shouldn't be a problem because it only happens when using
debugging command line arguments and the code simply don't
run in regular Blender operation.

The code seems a bit duplicated here across old and new
depsgraph, but think it's OK since the old depsgraph is already
gone in 2.8 branch and i don't see where else we might want
to use such a single-threaded scheduler.

When/if we'll want to do so, we can move it to a centralized
single-threaded scheduler in threads.c.

OpenGL render was a bit more tricky to port, but basically we
are using conditional variables to wait background thread to
do all the job.
2017-03-07 17:32:01 +01:00
7b92b64742 Fix own previous commit, sorry about that :( 2017-03-03 17:23:22 +01:00
7fcae7ba60 Task scheduler: Remove query for the pool's number of threads
Not really happy of per-pool threads limit, need to find better
approach to that. But at least it's possible to get rid of half
of the nastyness here by removing getter which was only used in
an assert statement.

That piece of code was already well-tested and this code becomes
obsolete in the new depsgraph and does no longer exists in blender
2.8 branch.
2017-03-01 18:00:54 +01:00
f0cf15b5c6 Task scheduler: Remove counter of done tasks
This was only used for progress report, and it's wrong because:

- Pool might in theory be re-used by different tasks
- We should not make any decision based on scheduling stats

Proper way is to take care of progress by the task itself.
2017-03-01 12:45:51 +01:00
688858d3a8 BLI_task: Add new 'BLI_task_parallel_range_finalize()'.
Together with the extended loop callback and userdata_chunk, this allows to perform
cumulative tasks (like aggregation) in a lockfree way using local userdata_chunk to store temp data,
and once all workers have finished, to merge those userdata_chunks in the finalize callback
(from calling thread, so no need to lock here either).

Note that this changes how userdata_chunk is handled (now fully from 'main' thread,
which means a given worker thread will always get the same userdata_chunk, without
being re-initialized anymore to init value at start of each iter chunk).
2016-05-16 17:15:18 +02:00
868cfc5a4a BLI_task: add support for listbase parallelized for loops.
Code by @sergey, with small edits and doc by @mont29.
2016-05-13 12:06:15 +02:00
7efa34d078 Task scheduler: Add thread-aware task push routines
This commit implements new function BLI_task_pool_push_from_thread()
who's main goal is to have less parasitic load on the CPU bu avoiding
memory allocations as much as possible, making taks pushing cheaper.

This function expects thread ID, which must be 0 for the thread from
which pool is created from (and from which wait_work() is called) and
for other threads it mush be the ID which was sent to the thread working
function.

This reduces allocations quite a bit in the new dependency graph,
hopefully gaining some visible speedup on a fewzillion core machines
(on my own machine can only see benefit in profiler, which shows
significant reduce of time wasted in the memory allocation).
2016-05-10 10:01:24 +02:00
31d907fa0a Cleanup: BLI_task - API changes.
Based on usages so far:
- Split callback worker func in two, 'basic' and 'extended' versions. The former goes back
  to the simplest verion, while the later keeps the 'userdata_chunk', and gets the thread_id too.
- Add use_threading to simple BLI_task_parallel_range(), turns out we need this pretty much systematically,
  and allows to get rid of most usages of BLI_task_parallel_range_ex().
- Now BLI_task_parallel_range() expects 'basic' version of callback, while BLI_task_parallel_range_ex()
  expectes 'extended' version of the callback.

All in all, this should make common usage of BLI_task_parallel_range simpler (less verbose), and add
access to advanced callback to thread id, which is mandatory in some (future) cases.
2016-01-16 15:59:37 +01:00
511e3c5d9d BLI_task: change BLI_task_parallel_range_ex() to just take a bool whether to use threading or not, instead of threshold.
From recent experience, turns out we often do want to use something else than basic
range of parallelized forloop as control parameter over threads usage, so now BLI func
only takes a boolean, and caller defines best check for its own case.
2015-12-30 20:39:56 +01:00
0f609d5d04 BLI_task: BLI_task_parallel_range_ex: add some per-chunk userdata.
This mimics OpenMP's 'firstprivate' feature. It is sometimes handy to have some persistent local data during a whole chunk.

Reviewers: sergey

Reviewed By: sergey

Subscribers: campbellbarton

Differential Revision: https://developer.blender.org/D1635
2015-11-25 11:01:59 +01:00
04ac8768ef BLI_task: add support for full-background taskpools.
With current code, in single-threaded context, a pool of task may never be executed
until one calls BLI_task_pool_work_and_wait() on it, this is not acceptable for
asynchronous tasks where you never want to actually lock the main thread.

This commits adds an extra thread in single-threaded case, and a new 'type' of pool,
such that one can create real background pools of tasks. See code for details.

Review: D1565
2015-11-02 16:57:48 +01:00
44774f8160 BLI_task: add freedata callback to tasks.
Useful in case one needs more complex handling of tasks data than a mere MEM_freeN().
2015-11-02 16:52:19 +01:00
97ae0f22cd Depsgraph debug: Remove hardcoded array of BLENDER_MAX_THREADS elements
Allocate statistics array dynamically, so increasing max number of threads does
not increase sloppyness of the memory usage.

For the further cleanups: we can try alloca-ing this array, but it's also not
really safe because we can have quite huge number of threads in the future.
Plus statistics will allocate memory for each individual entry, so using alloca
is not going to give anything beneficial here.
2015-04-13 14:41:02 +05:00
77c926933b cleanup: warnings 2015-01-06 19:09:56 +11:00
7b0c529fe2 Task scheduler: Add an option to limit number of threads per pool
This way we can have scheduler capable of scheduling tasks on all the CPUs
but in the same time we can limit tasks like baking (in the future) to use
no more than given number of threads.
2014-11-21 11:31:30 +01:00
e43b74d87a Optimization of parallel range
It now supports different scheduling schemas: dynamic and static.
Static one is the default and it splits work into equal number of
range iterations.

Dynamic one allocates chunks of 32 iterations which then being
dynamically send to a thread which is currently idling.

This gives slightly better performance. Still some tricks are
possible to have. For example we can use some smarter static scheduling
when one thread might steal tasks from another threads when it runs
out of work to be done.

Also removed unneeded spin lock in the mesh deform evaluation,
on the first glance it seemed to be a reduction involved here but
int fact threads are just adding value to the original vertex
coordinates. No write access to the same element of  vertexCos
happens from separate threads.
2014-11-03 22:44:29 +05:00
dfc4de036e Meshdeform modifier: Use threaded evaluation
This commit switches meshdeform modifier to use threads to evaluate
the vertices positions using the central task scheduler.

SO now we've got an utility function to help splitting the for loop
into tasks using BLI_task module which is pretty straightforward to
use: it gets range (which is an integer lower and higher bounds) and
the function and userdata to be invoked for each of the iterations.

The only weak point for now is the passing the data to the callback,
this isn't so trivial to improve in pure C.

Reviewers: campbellbarton

Differential Revision: https://developer.blender.org/D838
2014-10-22 12:20:41 +02:00
48c1e0c0fc spelling: use American spelling for canceled 2013-10-26 01:06:19 +00:00
f0dcff9aa9 Task scheduler ported form CYcles to C
Replaces ThreadedWorker and is gonna to be used
for threaded object update in the future and
some more upcoming changes.

But in general, it's to be used for any task
based subsystem in Blender.

Originally written by Brecht, with some fixes
and tweaks by self.
2013-10-12 14:08:59 +00:00