blender-archive

Archived

Author	SHA1	Message	Date
Bastien Montagne	497e2b3dfa	Cleanup: use signed atomic ops when needed.	2017-11-23 16:24:34 +01:00
Campbell Barton	00c4f49a6d	Cleanup: indentation, long lines	2017-06-12 13:38:21 +10:00
Sergey Sharybin	a481908232	Task scheduler: Optimize subsequent pushing bunch of tasks The idea is to accumulate all new tasks in a thread local queue first without doing any thread synchronization (aka, locks and conditional variables) and move those tasks to a scheduler queue once they are all ready. This way we avoid per-task-pool lock and only have one lock per bunch of tasks. This is particularly handy when scheduling new dependency graph node children. Brings FPS of cached simulation from the linked below file from ~30 to ~50. See documentation for BLI_task_pool_delayed_push_{begin, end} and for TaskThreadLocalStorage::do_delayed_push. Fixes T50027: Rigidbody playback and simulation performance regression with new depsgraph Thanks Bastien for the review!	2017-05-31 15:44:08 +02:00
Sergey Sharybin	2ae6973936	Cleanup: Easier to read constant name	2017-05-31 14:52:45 +02:00
Sergey Sharybin	575d6415fb	Task scheduler: Fix typo in TLS for pools created from non-main thread Did a mistake which started to use same TLS for all threads for such pools. Also added some extra asserts to help catching the bugs.	2017-04-13 13:34:07 +02:00
Sergey Sharybin	ed5c3121f5	Task scheduler: Prevent race condition for the pools created from non-main thread We can not re-use anything for such pools, because we will know nothing about whether the main thread is sleeping or not. So we identify such threads as 0, but we don't use main thread's TLS. This fixes dead-locks and crashes reported by Luca when doing playblasts.	2017-04-12 18:20:17 +02:00
Sergey Sharybin	ca5ccf5cd4	Task: Remove non-atomic pool suspended flag assignment This was done some lines above by atomic fetch and and.	2017-04-04 12:32:15 +02:00
Campbell Barton	bcc8c04db4	Cleanup: code style & cmake	2017-03-12 02:47:53 +11:00
Sergey Sharybin	a095611eb8	Fix T50886: Blender crashes on render Was a mistake in one of the previous TLS commits. See comment in the pool_create to see some details why it was crashing.	2017-03-08 09:41:38 +01:00
Sergey Sharybin	9e566b06e3	Task scheduler: Add concept of suspended pools Suspended pools allows to push huge amount of initial tasks without any threading synchronization and hence overhead. This gives ~50% speedup of cached rigid body with file from T50027 and seems to have no negative affect in other scenes here.	2017-03-07 17:32:01 +01:00
Sergey Sharybin	55c2cd85f0	Task scheduler: Initial implementation of local tasks queues The idea is to allow some amount of tasks to be pushed from working thread to it's local queue, so we can acquire some work without doing whole mutex lock. This should allow us to remove some hacks from depsgraph which was added there to keep threads alive.	2017-03-07 17:32:01 +01:00
Sergey Sharybin	2f722f1a49	Task scheduler: Use real pthread's TLS to access active thread's data This allows us to avoid TLS stored in pool which gives us advantage of using pre-allocated tasks pool for the pools created from non-main thread. Even on systems with slow pthread TLS it should not be a problem because we access it once at a pool construction time. If we want to use this more often (for example, to get rid of push_from_thread) we'll have to do much more accurate benchmark.	2017-03-07 17:32:01 +01:00
Sergey Sharybin	a07ad02156	Task scheduler: Refactor the way we store thread-spedific data Basically move all thread-specific data (currently it's only task memory pool) from a dedicated array of taskScheduler to TaskThread. This way we can add more thread-specific data in the future with less of a hassle.	2017-03-07 17:32:01 +01:00
Sergey Sharybin	9522f8acf0	Task scheduler: Remove per-pool threads limit This feature was adding extra complexity to task scheduling which required yet extra variables to be worried about to be modified in atomic manner, which resulted in following issues: - More complex code to maintain, which increases risks of something going wrong when we modify the code. - Extra barriers and/or locks during task scheduling, which causes extra threading overhead. - Unable to use some other implementation (such as TBB) even for the comparison tests. Notes about other changes. There are two places where we really had to use that limit. One of them is the single threaded dependency graph. This will now construct a single-threaded scheduler at evaluation time. This shouldn't be a problem because it only happens when using debugging command line arguments and the code simply don't run in regular Blender operation. The code seems a bit duplicated here across old and new depsgraph, but think it's OK since the old depsgraph is already gone in 2.8 branch and i don't see where else we might want to use such a single-threaded scheduler. When/if we'll want to do so, we can move it to a centralized single-threaded scheduler in threads.c. OpenGL render was a bit more tricky to port, but basically we are using conditional variables to wait background thread to do all the job.	2017-03-07 17:32:01 +01:00
Sergey Sharybin	b498db06eb	Task scheduler: Cleanup, use BLI_assert() instead of assert()	2017-03-06 11:33:27 +01:00
Bastien Montagne	2e8398c095	Get rid of `BLI_task_pool_stop()`. Comments said that function was supposed to 'stop worker threads', but it absolutely did not do anything like that, was merely wiping out TODO queue of tasks from given pool (kind of subset of what `BLI_task_pool_cancel()` does). Misleading, and currently useless, we can always add it back if we need it some day, but for now we try to simplify that area.	2017-03-03 17:16:39 +01:00
Bastien Montagne	18c2a44333	Fix ugly mistake in BLI_task - freeing while some tasks are still being processed. Freeing pool was calling `BLI_task_pool_stop()`, which only clears pool's tasks that are in TODO queue, whithout ensuring no more tasks from that pool are being processed in worker threads. This could lead to use-after-free random (and seldom) crashes. Now use instead `BLI_task_pool_cancel()`, which does waits for all tasks being processed to finish, before returning.	2017-03-03 17:12:03 +01:00
Sergey Sharybin	17cf423f30	Cleanup: Indentation	2017-03-03 15:53:55 +01:00
Sergey Sharybin	7fcae7ba60	Task scheduler: Remove query for the pool's number of threads Not really happy of per-pool threads limit, need to find better approach to that. But at least it's possible to get rid of half of the nastyness here by removing getter which was only used in an assert statement. That piece of code was already well-tested and this code becomes obsolete in the new depsgraph and does no longer exists in blender 2.8 branch.	2017-03-01 18:00:54 +01:00
Sergey Sharybin	f0cf15b5c6	Task scheduler: Remove counter of done tasks This was only used for progress report, and it's wrong because: - Pool might in theory be re-used by different tasks - We should not make any decision based on scheduling stats Proper way is to take care of progress by the task itself.	2017-03-01 12:45:51 +01:00
Sergey Sharybin	4ee08e9533	Atomics: Make naming more obvious about which value is being returned	2016-11-15 12:16:26 +01:00
Bastien Montagne	0fe7446a30	BLI_task: fix case were some pool could work in more threads than allowed. We were checking for number of tasks from given pool already active, and then atomically increasing it if allowed - this is not correct, number could be increased by another thread between check and atomic op! Atomic primitives are nice, but you must be very careful with how you use them... Now we atomically increase counter, check result, and if we end up over max value, abort and decrease counter again. Spotted by Sergey, thanks!	2016-10-08 14:51:33 +02:00
Alexander Gavrilov	a31eca3fdd	Fix T49251: moving smoke domain with additional resolution causes crash. This is a bug in the multithreaded task manager in negative value range. The problem here is that if previter is unsigned, the comparison in the return statement is unsigned, and works incorrectly if stop < 0 && iter >= 0. This in turn can happen if stop is close to 0, because this code is designed to overrun the stop by chunk_size*num_threads as the threads terminate. This probably should go into 2.78 as it prevents a crash.	2016-09-05 15:50:12 +03:00
Campbell Barton	0971749f4c	Cleanup: spelling, indentation	2016-06-29 20:37:54 +10:00
Campbell Barton	2465bd90d5	Cleanup: style, whitespace, doxy filepaths	2016-06-19 06:33:29 +10:00
Bastien Montagne	22ff9c5568	Fix T48497: Stupid typo in recent own BLI_task forloop work that broke non-parallelized case.	2016-05-22 18:35:44 +02:00
Bastien Montagne	688858d3a8	BLI_task: Add new 'BLI_task_parallel_range_finalize()'. Together with the extended loop callback and userdata_chunk, this allows to perform cumulative tasks (like aggregation) in a lockfree way using local userdata_chunk to store temp data, and once all workers have finished, to merge those userdata_chunks in the finalize callback (from calling thread, so no need to lock here either). Note that this changes how userdata_chunk is handled (now fully from 'main' thread, which means a given worker thread will always get the same userdata_chunk, without being re-initialized anymore to init value at start of each iter chunk).	2016-05-16 17:15:18 +02:00
Bastien Montagne	5a7429c363	BLI_task: Add back lost 'push_from_thread' change to BLI_task_parallel_range() & co.	2016-05-16 17:00:15 +02:00
Bastien Montagne	575d7a9666	BLI_task: make foreach loop index hleper lockfree, take II. New code is actually much, much better than first version, using 'fetch_and_add' atomic op here allows us to get rid of the loop etc. The broken CAS issue remains on windows, to be investigated...	2016-05-16 15:57:19 +02:00
Bastien Montagne	bb7da630ba	Fix T48422: Revert "BLI_task: nano-optimizations to BLI_task_parallel_range feature." There are some serious issues under windows, causing deadlocks somehow (not reproducible under linux so far). Until further investigation over why this happens, better to revert to previous spin-locked behavior. This reverts commits `a83bc4f597` and `98123ae916`.	2016-05-15 21:14:40 +02:00
Bastien Montagne	a83bc4f597	Fix an error in new lockfree parallel_range_next_iter_get() helper. Reading the shared state->iter value after storing it in the 'reference' var could in theory lead to a race condition setting state->iter value above state->stop, which would be 'deadly'. This may be the cause of T48422, though I was not able to reproduce that issue so far.	2016-05-14 18:06:05 +02:00
Bastien Montagne	868cfc5a4a	BLI_task: add support for listbase parallelized for loops. Code by @sergey, with small edits and doc by @mont29.	2016-05-13 12:06:15 +02:00
Bastien Montagne	98123ae916	BLI_task: nano-optimizations to BLI_task_parallel_range feature. This commit makes use of new taskpool feature (instead of allocating own tasks), and removes the spinlock used to generate chunks (using atomic ops instead). In best cases (dynamic scheduled loop with light processing func callback), we get a few percents of speedup, in most cases there is no sensible enhancement.	2016-05-10 17:57:53 +02:00
Sergey Sharybin	335274192e	Revert "Task scheduler: Avoid mutex lock in number manipulation functions" Appears mutex was guarateeing number of tasks is not modified at moments when it's not expected. Removing those mutexes resulted in some hard-to-catch locks where worker thread were waiting for work by all the tasks were already done. This reverts commit `a1d8fe052c`.	2016-05-10 15:43:03 +02:00
Sergey Sharybin	a1d8fe052c	Task scheduler: Avoid mutex lock in number manipulation functions It seems using atomic operations here we can avoid having mute without breaking anything. Thanks Bastien for double-checking the changes!	2016-05-10 14:59:19 +02:00
Bastien Montagne	fcc2175710	Fix own mistake in rBd617de965ea20e5d5 from late December 2015. Brain melt here, intention was to reduce number of tasks in case we have not much chunks of data to loop over, not to increase it! Note that this only affected dynamic scheduling.	2016-05-10 13:10:21 +02:00
Sergey Sharybin	7efa34d078	Task scheduler: Add thread-aware task push routines This commit implements new function BLI_task_pool_push_from_thread() who's main goal is to have less parasitic load on the CPU bu avoiding memory allocations as much as possible, making taks pushing cheaper. This function expects thread ID, which must be 0 for the thread from which pool is created from (and from which wait_work() is called) and for other threads it mush be the ID which was sent to the thread working function. This reduces allocations quite a bit in the new dependency graph, hopefully gaining some visible speedup on a fewzillion core machines (on my own machine can only see benefit in profiler, which shows significant reduce of time wasted in the memory allocation).	2016-05-10 10:01:24 +02:00
Sergey Sharybin	9ac35be63a	Task scheduler: Don't calloc in performance-critical areas Majority of the fields are being overwritten anyway, so calloc it kinda waste of CPU ticks.	2016-05-09 14:54:24 +02:00
Campbell Barton	d5ddc52ae1	Cleanup: style	2016-01-19 04:54:39 +11:00
Bastien Montagne	8acf14c55c	Cleanup: BLI_task foreach looper API doc.	2016-01-16 16:06:27 +01:00
Bastien Montagne	31d907fa0a	Cleanup: BLI_task - API changes. Based on usages so far: - Split callback worker func in two, 'basic' and 'extended' versions. The former goes back to the simplest verion, while the later keeps the 'userdata_chunk', and gets the thread_id too. - Add use_threading to simple BLI_task_parallel_range(), turns out we need this pretty much systematically, and allows to get rid of most usages of BLI_task_parallel_range_ex(). - Now BLI_task_parallel_range() expects 'basic' version of callback, while BLI_task_parallel_range_ex() expectes 'extended' version of the callback. All in all, this should make common usage of BLI_task_parallel_range simpler (less verbose), and add access to advanced callback to thread id, which is mandatory in some (future) cases.	2016-01-16 15:59:37 +01:00
Bastien Montagne	82d88e42a5	BLI_task threaded looper: do not assert when start == stop. This can happen quite often in forloops, and would be annoying to have to check for this in caller code! So now, just return without doing anything in this case.	2016-01-04 19:17:09 +01:00
Bastien Montagne	511e3c5d9d	BLI_task: change BLI_task_parallel_range_ex() to just take a bool whether to use threading or not, instead of threshold. From recent experience, turns out we often do want to use something else than basic range of parallelized forloop as control parameter over threads usage, so now BLI func only takes a boolean, and caller defines best check for its own case.	2015-12-30 20:39:56 +01:00
Bastien Montagne	d617de965e	Fix (unreported) broken BLI_task's forloop func in case we have less iterations that workers. When called with very small range, `BLI_task_parallel_range_ex()` would generate a zero `chunk_size`, leading to some infinite looping in `parallel_range_func` due to `parallel_range_next_iter_get` returning true without actually increasing the counter! So now, we ensure `chunk_size` and `num_tasks` are always at least 1 (and avoid generating too much tasks too).	2015-12-28 00:37:07 +01:00
Campbell Barton	dc98a3b0a7	Cleanup: style/spelling	2015-12-12 15:10:03 +11:00
Bastien Montagne	0f609d5d04	BLI_task: BLI_task_parallel_range_ex: add some per-chunk userdata. This mimics OpenMP's 'firstprivate' feature. It is sometimes handy to have some persistent local data during a whole chunk. Reviewers: sergey Reviewed By: sergey Subscribers: campbellbarton Differential Revision: https://developer.blender.org/D1635	2015-11-25 11:01:59 +01:00
Campbell Barton	7a09d15ade	Cleanup: comments/style	2015-11-06 05:34:05 +11:00
Bastien Montagne	d0d523d809	Better fix for pthread ID comparison crap on windows. Suggested by Sergey, thanks!	2015-11-02 19:25:00 +01:00
Bastien Montagne	d2fbbed462	Attempt to fix win32 compilation after own recent commits.	2015-11-02 18:48:14 +01:00
Bastien Montagne	04ac8768ef	BLI_task: add support for full-background taskpools. With current code, in single-threaded context, a pool of task may never be executed until one calls BLI_task_pool_work_and_wait() on it, this is not acceptable for asynchronous tasks where you never want to actually lock the main thread. This commits adds an extra thread in single-threaded case, and a new 'type' of pool, such that one can create real background pools of tasks. See code for details. Review: D1565	2015-11-02 16:57:48 +01:00

1 2

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

64 Commits