- When returning the number of items in a collection use BLI_*_len()
- Keep _size() for size in bytes.
- Keep _count() for data structures that don't store length
(hint this isn't a simple getter).
See P611 to apply instead of manually resolving conflicts.
Helps in cases of not very complex scenes and lots of system threads available.
A bit hard to measure change on it's own, it works best with the upcoming
changes and gives measurable improvements.
Now all the fine-tuning is happening using parallel range settings structure,
which avoid passing long lists of arguments, allows extend fine-tuning further,
avoid having lots of various functions which basically does the same thing.
This statistics is only collected when debug_value is different from 0.
Stored in depsgraph node itself, so we can always have access to average data
and other stats which requires persistent storage. This way we also don't waste
time trying to find stats from a separately stored hash map.
Not only this helps merges form master to the branch, but also:
- Allows us to production-check changes as soon as possible.
- Avoids some unnecessary editors update about ID changes.
- Adds small optimization on queue size by always keeping one of the pointers
outside of the queue.
Currently this is a no-visible-changes change, but the idea is to use this
dedicated flag to tell which exact components of ID changed, make it more
granular than just OBJECT and OBJECT_DATA. Allow setting this field based
on what components new dependency graph flushed on evaluation.
Was never actually used and implementation seems to be slow: we shouldn't be
doing per-node evaluation hash lookups, adds too much overhead. We can instead
store statistics in the node itself, and maybe even group them somehow.
Ideally such a statistics should be user-friendly so riggers and animators
can see exactly what's happening.
Was rather weird and only used for time source. It is simpler to make depsgraph
to keep track of time source directly.
No need to introduce extra entitites without actual need.
The idea is to accumulate all new tasks in a thread local queue
first without doing any thread synchronization (aka, locks and
conditional variables) and move those tasks to a scheduler queue
once they are all ready. This way we avoid per-task-pool lock
and only have one lock per bunch of tasks.
This is particularly handy when scheduling new dependency graph
node children. Brings FPS of cached simulation from the linked
below file from ~30 to ~50.
See documentation for BLI_task_pool_delayed_push_{begin, end}
and for TaskThreadLocalStorage::do_delayed_push.
Fixes T50027: Rigidbody playback and simulation performance regression with new depsgraph
Thanks Bastien for the review!
Suspended pools allows to push huge amount of initial tasks
without any threading synchronization and hence overhead.
This gives ~50% speedup of cached rigid body with file from
T50027 and seems to have no negative affect in other scenes
here.
This feature was adding extra complexity to task scheduling
which required yet extra variables to be worried about to be
modified in atomic manner, which resulted in following issues:
- More complex code to maintain, which increases risks of
something going wrong when we modify the code.
- Extra barriers and/or locks during task scheduling, which
causes extra threading overhead.
- Unable to use some other implementation (such as TBB) even for
the comparison tests.
Notes about other changes.
There are two places where we really had to use that limit.
One of them is the single threaded dependency graph. This will
now construct a single-threaded scheduler at evaluation time.
This shouldn't be a problem because it only happens when using
debugging command line arguments and the code simply don't
run in regular Blender operation.
The code seems a bit duplicated here across old and new
depsgraph, but think it's OK since the old depsgraph is already
gone in 2.8 branch and i don't see where else we might want
to use such a single-threaded scheduler.
When/if we'll want to do so, we can move it to a centralized
single-threaded scheduler in threads.c.
OpenGL render was a bit more tricky to port, but basically we
are using conditional variables to wait background thread to
do all the job.
This kind of keeps threads "warmer" and should in theory give better
cache coherency bringing some %% of speedup. It was already tested
few months ago and it gave few % speedup in barber shop, but was
reverted due to some bone popping. The popping is now fixed so it
should be fine to use new scheduling policy.
This reverts commit 9444cd56db.
This commit caused some flickering in the bones when swapping IK to Fk.
While it's unclear why such change caused any regressions, let's revert
it to unlock the studio.
The idea of the change is to avoid queue growing too long
and handle all the operations as quick as possible.
Gives about 3% speedup on one of the barber shots here.
This reverts commit 47d0d9cca4.
Reverting the commit. Not only it did not solve all the cases of proxy popping,
but also broke real cases with single proxy involved.
Makes behavior of proxy_from backlink working similar to the old dependency graph.
it's nasty, but needed here in the studio to get proxies fixes ASAP.