Drivers on hide_viewport and hide_render are unreliable, and throw warnings on linked-but-not-proxied objects. #73593
Labels
No Label
Priority
High
Priority
Low
Priority
Normal
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Content
Type
Design
Type
Report
Type
To Do
Type
Web Development
No Milestone
No project
No Assignees
8 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: studio/blender-studio#73593
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: Linux-5.3.0-7625-generic-x86_64-with-debian-buster-sid 64 Bits
Graphics card: GeForce RTX 2080/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 440.44
Blender Version
Broken: version: 2.83 (sub 1), branch: HEAD, commit date: 2020-01-29 14:09, hash:
blender/blender@9cb7ecefce
Worked: Maybe a month or two ago?
Short description of error
When a proxy armature's custom property drives an object's hide_viewport or hide_render property, the console spams this warning:
When the armature is not a proxy, the issue is still present, but the above warning is not.
In the case of a simple file, this doesn't seem to cause an issue, but as the file gets more complex, the driver ends up becoming unpredictable/unreliable, and can even result in it locking a mesh as visible or invisible until the driver is simply removed.
Exact steps for others to reproduce the error
{F8321251}
This functionality will become important for our current project fairly soon, and I don't know of any workarounds.
Added subscriber: @Mets
blender/blender#70350 was marked as duplicate of this issue
Console spamms add_relation() warnings and causes drivers to misbehaveto Object visibility drivers misbehave when driven by proxy armatureObject visibility drivers misbehave when driven by proxy armatureto Drivers on hide_viewport and hide_render are unreliable, and throw warnings on proxies.Added subscriber: @mano-wii
Changed status from 'Needs Triage' to: 'Confirmed'
I can confirm the problem.
This does not appear to be a regression.
I tested previous versions and the same warnings appeared in blender 2.80rc1
It's been pointed out to me this may be a duplicate of blender/blender#56635, but that one was closed as resolved. Maybe this should be merged into that one, and that one be re-opened? I'm not 100% sure it's the same issue, but definitely seems very similar.
Added subscriber: @Phigon
Added subscriber: @dr.sybren
So is that 2.81? 2.82? When did it work?
What is "the issue" exactly, if it's not the warning being shown? You describe some unreliability, but the steps to reproduce don't let me reproduce that unreliability.
Here is a fairly simplified file and video:
hide_viewport_driver_bug_hailey.blend
driver_bug_simplescene.mp4
I didn't want to simplify it too much further since it feels like there is a correlation between scene complexity and how easy it is to reproduce the bug. (the more complex the scene, the more often the bug occurs) - even at this level of complexity, it seems that changing the values "fast"(multiple times per second) is necessary to trigger the bug.
And just in case, here's the full un-simplified character file, where reproducing the bug should be easiest:
hailey.blend
There is an enum in the rig's custom UI that is responsible for applying presets to the outfit. It does this by simply changing the driving custom properties with an update callback function, so triggering the bug is the easiest with that enum:
driver_bug_complexscene.mp4
As for a driver being completely stuck in one state, that only happened in a production file with a full set and several character rigs linked and proxied, so really an extremely complex scene. This was some time ago, and we didn't save it for a bug report - will try to in the future. But I'm hoping the above will be helpful enough. :x
Drivers on hide_viewport and hide_render are unreliable, and throw warnings on proxies.to Drivers on hide_viewport and hide_render are unreliable, and throw warnings on linked-but-not-proxied objects.Added subscribers: @Sergey, @brecht
After looking into this with @dr.sybren, we think the random failures are most likely due to a threading issue, where two threads are evaluating the viewport and render visibility driver for the same object in parallel. These two properties are stored as bitflags on the same integer, so writing to them at the same time is not thread-safe.
The solution we are thinking of is to enure such drivers are never evaluated in parallel. Instead we can group driver evaluation, so that all drivers for a single object or bone are evaluated serially. This could be done by checking the RNA paths, so that properties are grouped together for evaluation when they are in the same RNA struct.
Performance is likely not going to be significantly affected either way, evaluation like this should be granular enough. Creating fewer depsgraph nodes and relations may reduce overhead a little even.
@Sergey, I think we had another report about (or at least we discussed it), but I can't find it, maybe you remember.
Added subscriber: @mont29
Further testing confirmed this.
@mont29 had the insight that this issue will also happen when keyframe-animating such properties. He's looking into making the RNA access atomic so that all of those issues can be resolved in one go.
I'm not sure what it means to make the RNA access atomic, how would that be implemented, manually for each property? If so, there are many of them and taking that into account for every new property sounds fragile. I'm also not sure to what extent writing a
char
orshort
could conflict with an adjacent variable.If we can solve this in a generic way by grouping animation and driver evaluation, to me that still sounds better.
Also, using atomic instructions are slow and should be avoided when possible for performance reasons. If there is no other solution it may be fine, but I really want to avoid every RNA set function using atomics.
I think in general in our API design we should aim to make it safe for multiple threads to modify different datablocks, but we can't make such guarantees in the general case for multiple threads modifying properties of a single datablock. And the animation/driver system should be compatible with the general case, where the RNA set function may be user defined and modify multiple variables.
@brecht issue with bitflags in RNA is that from RNA side they look like different properties, but they actually affect the exact same value. So idea would be that default RNA setter would set those bitflags with atomic operations yes...
Here is a very quick and dirty first attempt, does not seem to work really though... Just for sake of doc:
P1262: (An Untitled Masterwork)
I extended my comment above a bit, to explain why I think this is the wrong level to add thread safety.
Not sure performances is a real issue here (atomics are slower, yes, but RNA also adds a fair amount of overhead, and all in all, I think I’d expect being able to evaluate tens of drivers/animation curves in parallel, even if those involve some atomic ops, would be far more efficient than evaluating all of them in a single thread, without atomic ops at all).
short
/char
are not an issue I think, there are atomic primitives for those as well, and DNA is aware of the type (see my patch attempt above).But I actually agree with @brecht for his last point - we have no control over custom set functions. So yes, I guess enforcing depsgraph eval of all animdata in a single datablock to happen in a single thread is the only generic safe way to go. But that will have a cost over performances I think, especially in e.g. production armatures, which can have thousands of drivers/anim curves on a single ID. :/
I expect it can improve performance slightly in production (but not enough to really matter). There is overhead to having more depsgraph nodes and relations. The only way you would have that many drivers on an ID in practice is bones, and we can group those together (and also have to for drivers to work between bones at all).
Please stay away from atomics in RNA. There is indeed some overhead in RNA system, but it does not affect how scalable code is with the number of threads. Once you add atomic the scalability goes to nothing.
Please also stop considering atomics a magic solution. In many places of Blender they are used wrong, killing scabaility.
There might be an open report about same issue, but I can not find quickly find it. However, similar issue was solved for drivers/animation on an array properties (such as location). It was similar type of an error where multiple threads tried to write individual array elements, which isn't supported by RNA (RNA writes an entire array). The code which deals with this can be found in
DepsgraphRelationBuilder::build_animdata_drivers()
. Keep this in mind.Unfortunately, Brecht's simple idea can't be implemented as-is: we allow interleaved driver evaluation, where you can drive one custom property with another and have as many IDs involved as one which, and the can of dependencies can be as long as you wish. In practice this means you need to be careful of what you are grouping with what and in which order.
Proposed solution:
Some implementation notes:
DepsgraphBuilder::cache_
,DepsgraphBuilderCache
. I would imagine that to implement the solution we would need to resolve quite a bit of RNA paths, potentially more than once. Need to benefit from the cache as much as possible.A -> B -> C
then evaluation happens from a "hot" thread: no re-scheduling is done, the active thread immediately jumps from node A to B to C. In practice this means that for "simple" cases those extra relations are likely to not cause any measurable overhead.build_copy_on_write_relations
is done). Just something to keep in mind when implementing fix for this issue.This issue was referenced by blender/blender@94e180bd80
Changed status from 'Confirmed' to: 'Resolved'
Changed status from 'Resolved' to: 'Confirmed'
This issue was referenced by blender/blender@4c30dc3431
Changed status from 'Confirmed' to: 'Resolved'
Resolved by committing 4c30dc343165a643e2173a055ed2aca3137991a5
I think I can still reproduce this issue, eg. changing the "Dress" property in hailey.blend above. :( Do I re-open?
Changed status from 'Resolved' to: 'Confirmed'
I can confirm that update is still not always working as expected, also with file from blender/blender#70350. No more warning though, afaict.
Added subscribers: @bunny, @ConradDueck, @JacquesLucke
This issue was referenced by blender/blender@7192a1bca3
Changed status from 'Confirmed' to: 'Resolved'