Namely, it caused nodes be adding to the pool multiple times.
Returned spin back, but use it only in cases node valency is
zero. So now valency is decreasing without any locks, then
if it's zero spin lock happens, node color (which indicates
whether node is scheduled or not) happens. Actual new task
creation happens outside of locks.
This might sound a bit complicated, but it's straightforward
code which is free from any thread synchronization latency.
Instead of allocating scheduler and starting threads
on every object_update call make it so scheduler is
global and threads are always ready to run.
This was we could avoid quite hacky thing which is
counting how much objects need to be updated before
starting threaded update.
It'll also allow using the same scheduler to do all
sorts of tasks, not only objects update in the same
scheduler. This is nice from load balancing point of
view.
Couple of changes were needed to be done in task
scheduler itself:
- Free tas before sending notifier.
- Free TaskThread structures outside from thread.
This is needed to make it possible use begin/end
threaded malloc from the main thread before/after
running the pool. Without this change it was possible
that allocation will switch to non-thread mode while
thread is freeing the task.
This required storing TaskThread array in Scheduler,
but it's not so much bad actually, since it also
reduces memory overhead caused by per-thread allocation.
- reading bmp images was failing (needed to increase the size of the header to 64 bytes)
- the dnd image was being incorrectly checked (was always returning true even when none was used).
Now the viewport rendering thread will lock the main thread while it is exporting
objects to render data. This is not ideal if you have big scenes that might block
the UI, but Cycles does the same, and it's fairly quick because the same evaluated
mesh can be used as for viewport drawing. It's the only way to get things stable
until the thread safe dependency graph is here.
This adds a mechanism to the job system for jobs to lock the main thread, using a
new 'ticket mutex lock' which is a mutex lock that gives priority to the first
thread that tries to lock the mutex.
Still to solve: undo/redo crashes.
Initially i wanted to have some really simple and basic
threading scheduler and wrote one based on traversing
depsgraph in advance. But it ended up in some issues with
the single-pass traverse i did which didn't gather all
the dependencies actually.
That was for sure solvable, but it ended up in a bit of
time consuming thing and with huge help of Brecht's
patch it was faster just to write proper balancing.
But it's again really basic thing, which could be
easily changed depending on feedback and design decisions
from Joshua,
So for now it works in the following way:
- Currently DagNode is used for threaded evaluaiton,
meaning traversing actually happens for DagNodes.
This is easier than converting DAG to a graph where
only objects are stored, but required adding one int
field to DagNode for faster runtime checks.
We could change this later when it'll be clear how
and where we'll store evaluation data, but for now
it work pretty ok.
- The new field is called "valency" and it's basically
number of node parents which needs to be evaluated
before the node itself could be evaluated.
- Nodes' valency is getting initialized before threading,
and when node finished to update valency of it's childs
is getting decreased by one. And if it happens so
child's valency became zero, it's adding to task pool.
- There's thread lock around valency update, it'll be
replaced with spinlock in nearest future.
- Another update runtime data is node color. White nodes
represents objects, gray one non-objects.
Currently it's needed to distinguish whether we need to
call object_handle_update on node->ob or not. In the
future it could be replaced with node->type to support
granularity, meaning we then could update object data
separately from object itself.
- Needed to add some public depsgraph functions to make
it possible to traverse depsgraph without including
depsgraph private header to other files.
This change doesn't make code anyhow more stable, but
solves update order issues noticed while working on
fixing underlying bugs.
Threaded update is still ifdef-ed for until curves and
armatures are considered thread-safe, which is next
step to be done.
This commit contains changes related on running function
BKE_object_handle_update_ex from multiple threads in order
to increase scene update time when having multiple
independent groups of objects.
Currently this required changes to two areas:
- scene.c, where scene_update_tagged_recursive is now using
threads for updating the object
There're some tricks to prevent threads from being spawned
when it's not needed:
* Threading will only happen if there're more than one CPU
core.
* Threading will happen only if there're more than single
object which needed to be updated.
There's currently one crappy part of the change: which is
freeing object caches (derivedFinal, derivedDeform and so)
from main thread. This is so because in case VBO are used
freeing DM is not thread safe. This is because DrawObject
used global array. Would look into possibility of making
that code safe later.
There're also currently some ifdef-ed debug-only code, which
helps a lot troubleshooting whether everything is working
fine. This code looks a bit ugly now, will either drop it
later or make it more cleat.
And one more thing: threaded update is CURRENTLY DISABLED.
This is because of some thread-unsafe issues discovered
while was working on this patch. Namely:
* I have once a crash in Curve module. Wasn't been able
to reproduce the crash, but could thing about some
unsafe code there.
* Virtual modifier list is not thread-safe (it uses static
variables).
* Armature modifier is also doesn't seem to be thread safe
because of storing some temporary runtime data in actual
armature.
All this issues are to be solved next.
- depsgraph.c, where i've added a function which gives list
of groups, each group contains objects and dependency is
only allowed between objects inside one group.
This is needed to make scheduling of objects easier, which
means update threads will operate on groups, and will handle
objects one-by-one inside group. Different threads will
operate on different groups.
Currently such groups will be generated on every update.
Actually, on every run of scene_update_objects_threaded which
only happens if there're objects marked for update. In the
future we could consider storing such groups in graph itself,
which will help saving CPU power on building such groups.
But this is something to be discussed with Joshua first.
P.S. If you really want to test threaded update, you'll
need to replace:
#undef USE_THREADED_UPDATE
with:
#define USE_THREADED_UPDATE
Now it's possible to mark operator as safe to be used
in locked interface mode by adding OPTYPE_ALLOW_LOCKED
bit to operator template flags.
This bit is completely handled by wm_evem_system, not
with operator run routines, so it's still possible to
run operators from drivers and handlers.
Currently allowed image editor navigation and zooming.
Added function called WM_set_locked_interface which does
two things:
- Prevents event queue from being handled, so no operators
or values are even possible to run or change. This prevents
any kind of "destructive" action performed from user while
rendering.
- Locks interface refresh for regions which does have lock
set to truth in their template. Currently it's just a 3D
viewport, but in the future more regions could be considered
unsafe, or we could want to lock different parts of
interface when doing different jobs.
This is needed because 3D viewport could be using or changing
the same data as renderer currently uses, leading to threading
conflict.
Notifiers are still allowed to handle, so render progress is
seen on the screen, but would need to doublecheck on this, in
terms some notifiers could be changing the data.
For now interface locking happens for render job only in case
"Lock Interface" checkbox is enabled.
Currently this option would only make rendering thread-safe, but
in the future more benefits are possible to gain from it. Namely,
if we'll make renderer using it's own graph, this option would
allow to free memory used by 3D viewport graph, which would help
keeping memory usage low (or even would allow renderer not to
copy anything in this case).
Initially thought this change will also allow to free DMs used
by viewport, but we couldn't actually do this. This is because
of modifiers which uses other objects (like boolean), They're
in fact using viewport DM. This is bad because of few reasons.
We currently need to have viewport DM when rendering.
And for sure even in background render viewport DMs are being
calculated. This sounds like 2x computing is needed: one is for
viewport DM and one is for RenderDM.
If we'll have local graphs, we'll be able to compute RenderDMs
only and store them in graph. This would require a bit more of
the memory, but would solve current issues with viewport DM
used for modifiers operands while rendering and it should give
quite noticeable speedup.
Other tools like backing would also benefit of this option,
but rather get approval of current way of locking first.
node materials.
Area and region listener callbacks now get the screen and area pointers passed, so
they can do more fine grained checks to see if redraw is really needed, for example
depending on the 3D view drawtype.