Drivers on hide_viewport and hide_render are unreliable, and throw warnings on linked-but-not-proxied objects. #73593

Closed
opened 2020-02-04 19:41:32 +01:00 by Demeter Dzadik · 33 comments
Member

System Information
Operating system: Linux-5.3.0-7625-generic-x86_64-with-debian-buster-sid 64 Bits
Graphics card: GeForce RTX 2080/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 440.44

Blender Version
Broken: version: 2.83 (sub 1), branch: HEAD, commit date: 2020-01-29 14:09, hash: blender/blender@9cb7ecefce
Worked: Maybe a month or two ago?

Short description of error
When a proxy armature's custom property drives an object's hide_viewport or hide_render property, the console spams this warning:

add_relation(Driver -> Driven Property) - Could not find op_to (RnaPathKey(id: OBCube, prop: 'hide_render'))

When the armature is not a proxy, the issue is still present, but the above warning is not.

In the case of a simple file, this doesn't seem to cause an issue, but as the file gets more complex, the driver ends up becoming unpredictable/unreliable, and can even result in it locking a mesh as visible or invisible until the driver is simply removed.

Exact steps for others to reproduce the error
{F8321251}

  • Download attached file
  • Link "Collection" collection to a fresh file
  • Make proxy->"Armature"
  • Pose mode->change the bone's custom properties. "prop" drives object visibility, "prop1" drives a shape key.
  • The console spams the warnings only regarding the former.

This functionality will become important for our current project fairly soon, and I don't know of any workarounds.

**System Information** Operating system: Linux-5.3.0-7625-generic-x86_64-with-debian-buster-sid 64 Bits Graphics card: GeForce RTX 2080/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 440.44 **Blender Version** Broken: version: 2.83 (sub 1), branch: HEAD, commit date: 2020-01-29 14:09, hash: `blender/blender@9cb7ecefce` Worked: Maybe a month or two ago? **Short description of error** When a proxy armature's custom property drives an object's hide_viewport or hide_render property, the console spams this warning: ```add_relation(Driver -> Driven Property) - Failed, but op_from (OperationKey(type: PARAMETERS, component name: '', operation code: DRIVER, 'hide_render')) was ok add_relation(Driver -> Driven Property) - Could not find op_to (RnaPathKey(id: OBCube, prop: 'hide_render')) ``` When the armature is **not a proxy**, the issue is still present, but the above warning is not. In the case of a simple file, this doesn't seem to cause an issue, but as the file gets more complex, the driver ends up becoming unpredictable/unreliable, and can even result in it locking a mesh as visible or invisible until the driver is simply removed. **Exact steps for others to reproduce the error** {F8321251} - Download attached file - Link "Collection" collection to a fresh file - Make proxy->"Armature" - Pose mode->change the bone's custom properties. "prop" drives object visibility, "prop1" drives a shape key. - The console spams the warnings only regarding the former. This functionality will become important for our current project fairly soon, and I don't know of any workarounds.
Author
Member

Added subscriber: @Mets

Added subscriber: @Mets

blender/blender#70350 was marked as duplicate of this issue

blender/blender#70350 was marked as duplicate of this issue
Demeter Dzadik changed title from Console spamms add_relation() warnings and causes drivers to misbehave to Object visibility drivers misbehave when driven by proxy armature 2020-02-04 19:44:48 +01:00
Demeter Dzadik changed title from Object visibility drivers misbehave when driven by proxy armature to Drivers on hide_viewport and hide_render are unreliable, and throw warnings on proxies. 2020-02-07 17:18:22 +01:00

Added subscriber: @mano-wii

Added subscriber: @mano-wii

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'

I can confirm the problem.

This does not appear to be a regression.
I tested previous versions and the same warnings appeared in blender 2.80rc1

I can confirm the problem. This does not appear to be a regression. I tested previous versions and the same warnings appeared in blender 2.80rc1
Author
Member

It's been pointed out to me this may be a duplicate of blender/blender#56635, but that one was closed as resolved. Maybe this should be merged into that one, and that one be re-opened? I'm not 100% sure it's the same issue, but definitely seems very similar.

It's been pointed out to me this may be a duplicate of blender/blender#56635, but that one was closed as resolved. Maybe this should be merged into that one, and that one be re-opened? I'm not 100% sure it's the same issue, but definitely seems very similar.

Added subscriber: @Phigon

Added subscriber: @Phigon

Added subscriber: @dr.sybren

Added subscriber: @dr.sybren

Worked: Maybe a month or two ago?

So is that 2.81? 2.82? When did it work?

> Worked: Maybe a month or two ago? So is that 2.81? 2.82? When did it work?
Sybren A. Stüvel self-assigned this 2020-02-17 14:32:23 +01:00

When the armature is not a proxy, the issue is still present, but the above warning is not.

What is "the issue" exactly, if it's not the warning being shown? You describe some unreliability, but the steps to reproduce don't let me reproduce that unreliability.

> When the armature is not a proxy, the issue is still present, but the above warning is not. What is "the issue" exactly, if it's not the warning being shown? You describe some unreliability, but the steps to reproduce don't let me reproduce that unreliability.
Author
Member

Here is a fairly simplified file and video:
hide_viewport_driver_bug_hailey.blend
driver_bug_simplescene.mp4
I didn't want to simplify it too much further since it feels like there is a correlation between scene complexity and how easy it is to reproduce the bug. (the more complex the scene, the more often the bug occurs) - even at this level of complexity, it seems that changing the values "fast"(multiple times per second) is necessary to trigger the bug.

And just in case, here's the full un-simplified character file, where reproducing the bug should be easiest:
hailey.blend
There is an enum in the rig's custom UI that is responsible for applying presets to the outfit. It does this by simply changing the driving custom properties with an update callback function, so triggering the bug is the easiest with that enum:
driver_bug_complexscene.mp4

As for a driver being completely stuck in one state, that only happened in a production file with a full set and several character rigs linked and proxied, so really an extremely complex scene. This was some time ago, and we didn't save it for a bug report - will try to in the future. But I'm hoping the above will be helpful enough. :x

Here is a fairly simplified file and video: [hide_viewport_driver_bug_hailey.blend](https://archive.blender.org/developer/F8347482/hide_viewport_driver_bug_hailey.blend) [driver_bug_simplescene.mp4](https://archive.blender.org/developer/F8347503/driver_bug_simplescene.mp4) I didn't want to simplify it too much further since it feels like there is a correlation between scene complexity and how easy it is to reproduce the bug. (the more complex the scene, the more often the bug occurs) - even at this level of complexity, it seems that changing the values "fast"(multiple times per second) is necessary to trigger the bug. And just in case, here's the full un-simplified character file, where reproducing the bug should be easiest: [hailey.blend](https://archive.blender.org/developer/F8347506/hailey.blend) There is an enum in the rig's custom UI that is responsible for applying presets to the outfit. It does this by simply changing the driving custom properties with an update callback function, so triggering the bug is the easiest with that enum: [driver_bug_complexscene.mp4](https://archive.blender.org/developer/F8347528/driver_bug_complexscene.mp4) As for a driver being completely stuck in one state, that only happened in a production file with a full set and several character rigs linked and proxied, so really an extremely complex scene. This was some time ago, and we didn't save it for a bug report - will try to in the future. But I'm hoping the above will be helpful enough. :x
Sybren A. Stüvel changed title from Drivers on hide_viewport and hide_render are unreliable, and throw warnings on proxies. to Drivers on hide_viewport and hide_render are unreliable, and throw warnings on linked-but-not-proxied objects. 2020-02-18 15:06:48 +01:00

Added subscribers: @Sergey, @brecht

Added subscribers: @Sergey, @brecht

After looking into this with @dr.sybren, we think the random failures are most likely due to a threading issue, where two threads are evaluating the viewport and render visibility driver for the same object in parallel. These two properties are stored as bitflags on the same integer, so writing to them at the same time is not thread-safe.

The solution we are thinking of is to enure such drivers are never evaluated in parallel. Instead we can group driver evaluation, so that all drivers for a single object or bone are evaluated serially. This could be done by checking the RNA paths, so that properties are grouped together for evaluation when they are in the same RNA struct.

Performance is likely not going to be significantly affected either way, evaluation like this should be granular enough. Creating fewer depsgraph nodes and relations may reduce overhead a little even.

@Sergey, I think we had another report about (or at least we discussed it), but I can't find it, maybe you remember.

After looking into this with @dr.sybren, we think the random failures are most likely due to a threading issue, where two threads are evaluating the viewport and render visibility driver for the same object in parallel. These two properties are stored as bitflags on the same integer, so writing to them at the same time is not thread-safe. The solution we are thinking of is to enure such drivers are never evaluated in parallel. Instead we can group driver evaluation, so that all drivers for a single object or bone are evaluated serially. This could be done by checking the RNA paths, so that properties are grouped together for evaluation when they are in the same RNA struct. Performance is likely not going to be significantly affected either way, evaluation like this should be granular enough. Creating fewer depsgraph nodes and relations may reduce overhead a little even. @Sergey, I think we had another report about (or at least we discussed it), but I can't find it, maybe you remember.

Added subscriber: @mont29

Added subscriber: @mont29

In #73593#875302, @brecht wrote:
After looking into this with @dr.sybren, we think the random failures are most likely due to a threading issue

Further testing confirmed this.

@mont29 had the insight that this issue will also happen when keyframe-animating such properties. He's looking into making the RNA access atomic so that all of those issues can be resolved in one go.

> In #73593#875302, @brecht wrote: > After looking into this with @dr.sybren, we think the random failures are most likely due to a threading issue Further testing confirmed this. @mont29 had the insight that this issue will also happen when keyframe-animating such properties. He's looking into making the RNA access atomic so that all of those issues can be resolved in one go.

I'm not sure what it means to make the RNA access atomic, how would that be implemented, manually for each property? If so, there are many of them and taking that into account for every new property sounds fragile. I'm also not sure to what extent writing a char or short could conflict with an adjacent variable.

If we can solve this in a generic way by grouping animation and driver evaluation, to me that still sounds better.

Also, using atomic instructions are slow and should be avoided when possible for performance reasons. If there is no other solution it may be fine, but I really want to avoid every RNA set function using atomics.

I think in general in our API design we should aim to make it safe for multiple threads to modify different datablocks, but we can't make such guarantees in the general case for multiple threads modifying properties of a single datablock. And the animation/driver system should be compatible with the general case, where the RNA set function may be user defined and modify multiple variables.

I'm not sure what it means to make the RNA access atomic, how would that be implemented, manually for each property? If so, there are many of them and taking that into account for every new property sounds fragile. I'm also not sure to what extent writing a `char` or `short` could conflict with an adjacent variable. If we can solve this in a generic way by grouping animation and driver evaluation, to me that still sounds better. Also, using atomic instructions are slow and should be avoided when possible for performance reasons. If there is no other solution it may be fine, but I really want to avoid every RNA set function using atomics. I think in general in our API design we should aim to make it safe for multiple threads to modify different datablocks, but we can't make such guarantees in the general case for multiple threads modifying properties of a single datablock. And the animation/driver system should be compatible with the general case, where the RNA set function may be user defined and modify multiple variables.

@brecht issue with bitflags in RNA is that from RNA side they look like different properties, but they actually affect the exact same value. So idea would be that default RNA setter would set those bitflags with atomic operations yes...

Here is a very quick and dirty first attempt, does not seem to work really though... Just for sake of doc:

P1262: (An Untitled Masterwork)

diff --git a/intern/atomic/atomic_ops.h b/intern/atomic/atomic_ops.h
index 106e19567da..82d609f8a56 100644
--- a/intern/atomic/atomic_ops.h
+++ b/intern/atomic/atomic_ops.h
@@ -99,6 +99,9 @@ ATOMIC_INLINE int32_t atomic_fetch_and_add_int32(int32_t *p, int32_t x);
 ATOMIC_INLINE int32_t atomic_fetch_and_or_int32(int32_t *p, int32_t x);
 ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x);
 
+ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t x);
+ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t x);
+
 ATOMIC_INLINE uint8_t atomic_fetch_and_or_uint8(uint8_t *p, uint8_t b);
 ATOMIC_INLINE uint8_t atomic_fetch_and_and_uint8(uint8_t *p, uint8_t b);
 
diff --git a/intern/atomic/intern/atomic_ops_msvc.h b/intern/atomic/intern/atomic_ops_msvc.h
index d9655defa81..af08802cea1 100644
--- a/intern/atomic/intern/atomic_ops_msvc.h
+++ b/intern/atomic/intern/atomic_ops_msvc.h
@@ -164,6 +164,30 @@ ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x)
   return InterlockedAnd((long *)p, x);
 }
 
+/******************************************************************************/
+/* 16-bit operations. */
+
+/* Signed */
+#pragma intrinsic(_InterlockedAnd16)
+ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t b)
+{
+#if (LG_SIZEOF_PTR == 8 || LG_SIZEOF_INT == 8)
+  return InterlockedAnd16((short *)p, (short)b);
+#else
+  return _InterlockedAnd16((short *)p, (short)b);
+#endif
+}
+
+#pragma intrinsic(_InterlockedOr16)
+ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t b)
+{
+#if (LG_SIZEOF_PTR == 8 || LG_SIZEOF_INT == 8)
+  return InterlockedOr16((short *)p, (short)b);
+#else
+  return _InterlockedOr16((short *)p, (short)b);
+#endif
+}
+
 /******************************************************************************/
 /* 8-bit operations. */
 
diff --git a/intern/atomic/intern/atomic_ops_unix.h b/intern/atomic/intern/atomic_ops_unix.h
index e1126cab0c2..17f5d7f98c1 100644
--- a/intern/atomic/intern/atomic_ops_unix.h
+++ b/intern/atomic/intern/atomic_ops_unix.h
@@ -317,6 +317,23 @@ ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x)
 #  error "Missing implementation for 32-bit atomic operations"
 #endif
 
+/******************************************************************************/
+/* 16-bit operations. */
+#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) || defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_2))
+/* Signed */
+ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t b)
+{
+  return __sync_fetch_and_and(p, b);
+}
+ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t b)
+{
+  return __sync_fetch_and_or(p, b);
+}
+
+#else
+#  error "Missing implementation for 16-bit atomic operations"
+#endif
+
 /******************************************************************************/
 /* 8-bit operations. */
 #if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1) || defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_1))
diff --git a/source/blender/makesrna/intern/makesrna.c b/source/blender/makesrna/intern/makesrna.c
index 2f5d9ae7a50..a0e44670753 100644
--- a/source/blender/makesrna/intern/makesrna.c
+++ b/source/blender/makesrna/intern/makesrna.c
@@ -974,6 +974,24 @@ static void rna_clamp_value(FILE *f, PropertyRNA *prop, int array)
   }
 }
 
+static void rna_def_property_bitflag_default_set_func(FILE *f,
+                                                      PropertyDefRNA *dp,
+                                                      const char *atomic_or_name,
+                                                      const char *atomic_and_name)
+{
+  fprintf(f,
+          "        if (%svalues- [x]) %s(&(data->%s), (%du << i));\n",
+          (dp->booleannegative) ? "!" : "",
+          atomic_or_name,
+          dp->dnaname,
+          dp->booleanbit);
+  fprintf(f,
+          "        else %s(&(data->%s), ~(%du << i));\n",
+          atomic_and_name,
+          dp->dnaname,
+          dp->booleanbit);
+}
+
 static char *rna_def_property_set_func(
     FILE *f, StructRNA *srna, PropertyRNA *prop, PropertyDefRNA *dp, const char *manualfunc)
 {
@@ -1110,12 +1128,20 @@ static char *rna_def_property_set_func(
 
           if (dp->dnaarraylength == 1) {
             if (prop->type == PROP_BOOLEAN && dp->booleanbit) {
-              fprintf(f,
-                      "        if (%svalues- [x]) data->%s |= (%du << i);\n",
-                      (dp->booleannegative) ? "!" : "",
-                      dp->dnaname,
-                      dp->booleanbit);
-              fprintf(f, "        else data->%s &= ~(%du << i);\n", dp->dnaname, dp->booleanbit);
+              if (dp->dnatype) {
+                if (STREQ(dp->dnatype, "char")) {
+                  rna_def_property_bitflag_default_set_func(
+                      f, dp, "atomic_fetch_and_or_int8", "atomic_fetch_and_and_int8");
+                }
+                if (STREQ(dp->dnatype, "short")) {
+                  rna_def_property_bitflag_default_set_func(
+                      f, dp, "atomic_fetch_and_or_int16", "atomic_fetch_and_and_int16");
+                }
+                if (STREQ(dp->dnatype, "int")) {
+                  rna_def_property_bitflag_default_set_func(
+                      f, dp, "atomic_fetch_and_or_int32", "atomic_fetch_and_and_int32");
+                }
+              }
             }
             else {
               fprintf(
@@ -4302,6 +4328,8 @@ static void rna_generate(BlenderRNA *brna, FILE *f, const char *filename, const
 
   fprintf(f, "#include \"MEM_guardedalloc.h\"\n\n");
 
+  fprintf(f, "#include \"atomic_ops.h\"\n\n");
+
   fprintf(f, "#include \"DNA_ID.h\"\n");
   fprintf(f, "#include \"DNA_scene_types.h\"\n");
 

@brecht issue with bitflags in RNA is that from RNA side they look like different properties, but they actually affect the exact same value. So idea would be that default RNA setter would set those bitflags with atomic operations yes... Here is a very quick and dirty first attempt, does not seem to work really though... Just for sake of doc: [P1262: (An Untitled Masterwork)](https://archive.blender.org/developer/P1262.txt) ``` diff --git a/intern/atomic/atomic_ops.h b/intern/atomic/atomic_ops.h index 106e19567da..82d609f8a56 100644 --- a/intern/atomic/atomic_ops.h +++ b/intern/atomic/atomic_ops.h @@ -99,6 +99,9 @@ ATOMIC_INLINE int32_t atomic_fetch_and_add_int32(int32_t *p, int32_t x); ATOMIC_INLINE int32_t atomic_fetch_and_or_int32(int32_t *p, int32_t x); ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x); +ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t x); +ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t x); + ATOMIC_INLINE uint8_t atomic_fetch_and_or_uint8(uint8_t *p, uint8_t b); ATOMIC_INLINE uint8_t atomic_fetch_and_and_uint8(uint8_t *p, uint8_t b); diff --git a/intern/atomic/intern/atomic_ops_msvc.h b/intern/atomic/intern/atomic_ops_msvc.h index d9655defa81..af08802cea1 100644 --- a/intern/atomic/intern/atomic_ops_msvc.h +++ b/intern/atomic/intern/atomic_ops_msvc.h @@ -164,6 +164,30 @@ ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x) return InterlockedAnd((long *)p, x); } +/******************************************************************************/ +/* 16-bit operations. */ + +/* Signed */ +#pragma intrinsic(_InterlockedAnd16) +ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t b) +{ +#if (LG_SIZEOF_PTR == 8 || LG_SIZEOF_INT == 8) + return InterlockedAnd16((short *)p, (short)b); +#else + return _InterlockedAnd16((short *)p, (short)b); +#endif +} + +#pragma intrinsic(_InterlockedOr16) +ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t b) +{ +#if (LG_SIZEOF_PTR == 8 || LG_SIZEOF_INT == 8) + return InterlockedOr16((short *)p, (short)b); +#else + return _InterlockedOr16((short *)p, (short)b); +#endif +} + /******************************************************************************/ /* 8-bit operations. */ diff --git a/intern/atomic/intern/atomic_ops_unix.h b/intern/atomic/intern/atomic_ops_unix.h index e1126cab0c2..17f5d7f98c1 100644 --- a/intern/atomic/intern/atomic_ops_unix.h +++ b/intern/atomic/intern/atomic_ops_unix.h @@ -317,6 +317,23 @@ ATOMIC_INLINE int32_t atomic_fetch_and_and_int32(int32_t *p, int32_t x) # error "Missing implementation for 32-bit atomic operations" #endif +/******************************************************************************/ +/* 16-bit operations. */ +#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) || defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_2)) +/* Signed */ +ATOMIC_INLINE int16_t atomic_fetch_and_and_int16(int16_t *p, int16_t b) +{ + return __sync_fetch_and_and(p, b); +} +ATOMIC_INLINE int16_t atomic_fetch_and_or_int16(int16_t *p, int16_t b) +{ + return __sync_fetch_and_or(p, b); +} + +#else +# error "Missing implementation for 16-bit atomic operations" +#endif + /******************************************************************************/ /* 8-bit operations. */ #if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1) || defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_1)) diff --git a/source/blender/makesrna/intern/makesrna.c b/source/blender/makesrna/intern/makesrna.c index 2f5d9ae7a50..a0e44670753 100644 --- a/source/blender/makesrna/intern/makesrna.c +++ b/source/blender/makesrna/intern/makesrna.c @@ -974,6 +974,24 @@ static void rna_clamp_value(FILE *f, PropertyRNA *prop, int array) } } +static void rna_def_property_bitflag_default_set_func(FILE *f, + PropertyDefRNA *dp, + const char *atomic_or_name, + const char *atomic_and_name) +{ + fprintf(f, + " if (%svalues- [x]) %s(&(data->%s), (%du << i));\n", + (dp->booleannegative) ? "!" : "", + atomic_or_name, + dp->dnaname, + dp->booleanbit); + fprintf(f, + " else %s(&(data->%s), ~(%du << i));\n", + atomic_and_name, + dp->dnaname, + dp->booleanbit); +} + static char *rna_def_property_set_func( FILE *f, StructRNA *srna, PropertyRNA *prop, PropertyDefRNA *dp, const char *manualfunc) { @@ -1110,12 +1128,20 @@ static char *rna_def_property_set_func( if (dp->dnaarraylength == 1) { if (prop->type == PROP_BOOLEAN && dp->booleanbit) { - fprintf(f, - " if (%svalues- [x]) data->%s |= (%du << i);\n", - (dp->booleannegative) ? "!" : "", - dp->dnaname, - dp->booleanbit); - fprintf(f, " else data->%s &= ~(%du << i);\n", dp->dnaname, dp->booleanbit); + if (dp->dnatype) { + if (STREQ(dp->dnatype, "char")) { + rna_def_property_bitflag_default_set_func( + f, dp, "atomic_fetch_and_or_int8", "atomic_fetch_and_and_int8"); + } + if (STREQ(dp->dnatype, "short")) { + rna_def_property_bitflag_default_set_func( + f, dp, "atomic_fetch_and_or_int16", "atomic_fetch_and_and_int16"); + } + if (STREQ(dp->dnatype, "int")) { + rna_def_property_bitflag_default_set_func( + f, dp, "atomic_fetch_and_or_int32", "atomic_fetch_and_and_int32"); + } + } } else { fprintf( @@ -4302,6 +4328,8 @@ static void rna_generate(BlenderRNA *brna, FILE *f, const char *filename, const fprintf(f, "#include \"MEM_guardedalloc.h\"\n\n"); + fprintf(f, "#include \"atomic_ops.h\"\n\n"); + fprintf(f, "#include \"DNA_ID.h\"\n"); fprintf(f, "#include \"DNA_scene_types.h\"\n"); ```

I extended my comment above a bit, to explain why I think this is the wrong level to add thread safety.

I extended my comment above a bit, to explain why I think this is the wrong level to add thread safety.

Not sure performances is a real issue here (atomics are slower, yes, but RNA also adds a fair amount of overhead, and all in all, I think I’d expect being able to evaluate tens of drivers/animation curves in parallel, even if those involve some atomic ops, would be far more efficient than evaluating all of them in a single thread, without atomic ops at all).

short/char are not an issue I think, there are atomic primitives for those as well, and DNA is aware of the type (see my patch attempt above).

But I actually agree with @brecht for his last point - we have no control over custom set functions. So yes, I guess enforcing depsgraph eval of all animdata in a single datablock to happen in a single thread is the only generic safe way to go. But that will have a cost over performances I think, especially in e.g. production armatures, which can have thousands of drivers/anim curves on a single ID. :/

Not sure performances is a real issue here (atomics are slower, yes, but RNA also adds a fair amount of overhead, and all in all, I think I’d expect being able to evaluate tens of drivers/animation curves in parallel, even if those involve some atomic ops, would be far more efficient than evaluating all of them in a single thread, without atomic ops at all). `short`/`char` are not an issue I think, there are atomic primitives for those as well, and DNA is aware of the type (see my patch attempt above). But I actually agree with @brecht for his last point - we have no control over custom set functions. So yes, I guess enforcing depsgraph eval of all animdata in a single datablock to happen in a single thread is the only generic safe way to go. But that will have a cost over performances I think, especially in e.g. production armatures, which can have thousands of drivers/anim curves on a single ID. :/

I expect it can improve performance slightly in production (but not enough to really matter). There is overhead to having more depsgraph nodes and relations. The only way you would have that many drivers on an ID in practice is bones, and we can group those together (and also have to for drivers to work between bones at all).

I expect it can improve performance slightly in production (but not enough to really matter). There is overhead to having more depsgraph nodes and relations. The only way you would have that many drivers on an ID in practice is bones, and we can group those together (and also have to for drivers to work between bones at all).

Please stay away from atomics in RNA. There is indeed some overhead in RNA system, but it does not affect how scalable code is with the number of threads. Once you add atomic the scalability goes to nothing.

Please also stop considering atomics a magic solution. In many places of Blender they are used wrong, killing scabaility.

There might be an open report about same issue, but I can not find quickly find it. However, similar issue was solved for drivers/animation on an array properties (such as location). It was similar type of an error where multiple threads tried to write individual array elements, which isn't supported by RNA (RNA writes an entire array). The code which deals with this can be found in DepsgraphRelationBuilder::build_animdata_drivers(). Keep this in mind.

Unfortunately, Brecht's simple idea can't be implemented as-is: we allow interleaved driver evaluation, where you can drive one custom property with another and have as many IDs involved as one which, and the can of dependencies can be as long as you wish. In practice this means you need to be careful of what you are grouping with what and in which order.

Proposed solution:

  • Force same RNA Struct properties to be evaluated in a single thread.
  • Use depsgraph relations to ensure order (and hence, single-threadness of evaluation).
  • Use Brecht's idea of single-threading same-datablock-properties as a base, but with extra trickery: force order of drivers which drive property of the same StructRNA unless there is a transitive dependency between them already.

Some implementation notes:

  • We have some base for an "animation property cache" in DepsgraphBuilder::cache_, DepsgraphBuilderCache. I would imagine that to implement the solution we would need to resolve quite a bit of RNA paths, potentially more than once. Need to benefit from the cache as much as possible.
  • Adding extra relations is technically introducing evaluation overhead, but we are not by any means perfect in the number of relations already. Unless there will be a measurable slowdown I will not worry about extra relations added for this fix.
  • Those extra relations can be addressed in the future by giving dependency graph a "cleanup" pass: remove all transitive relations, remove all duplicated relations. And, possibly, group sequential nodes into a single evaluation node.
  • Dependency graph evaluation is written in a way that if there is a linear nodes setup like A -> B -> C then evaluation happens from a "hot" thread: no re-scheduling is done, the active thread immediately jumps from node A to B to C. In practice this means that for "simple" cases those extra relations are likely to not cause any measurable overhead.
  • In order to have all transitive relations properly taken into account the make-drivers-singlethread relations must be added at the end of the relations builder, once all other relations are known. This is not a problem from what builder supports (this is already how build_copy_on_write_relations is done). Just something to keep in mind when implementing fix for this issue.
Please stay away from atomics in RNA. There is indeed some overhead in RNA system, but it does not affect how scalable code is with the number of threads. Once you add atomic the scalability goes to nothing. Please also stop considering atomics a magic solution. In many places of Blender they are used wrong, killing scabaility. There might be an open report about same issue, but I can not find quickly find it. However, similar issue was solved for drivers/animation on an array properties (such as location). It was similar type of an error where multiple threads tried to write individual array elements, which isn't supported by RNA (RNA writes an entire array). The code which deals with this can be found in `DepsgraphRelationBuilder::build_animdata_drivers()`. Keep this in mind. Unfortunately, Brecht's simple idea can't be implemented as-is: we allow interleaved driver evaluation, where you can drive one custom property with another and have as many IDs involved as one which, and the can of dependencies can be as long as you wish. In practice this means you need to be careful of what you are grouping with what and in which order. Proposed solution: - Force same RNA Struct properties to be evaluated in a single thread. - Use depsgraph relations to ensure order (and hence, single-threadness of evaluation). - Use Brecht's idea of single-threading same-datablock-properties as a base, but with extra trickery: force order of drivers which drive property of the same StructRNA unless there is a transitive dependency between them already. Some implementation notes: - We have some base for an "animation property cache" in `DepsgraphBuilder::cache_`, `DepsgraphBuilderCache`. I would imagine that to implement the solution we would need to resolve quite a bit of RNA paths, potentially more than once. Need to benefit from the cache as much as possible. - Adding extra relations is technically introducing evaluation overhead, but we are not by any means perfect in the number of relations already. Unless there will be a measurable slowdown I will not worry about extra relations added for this fix. - Those extra relations can be addressed in the future by giving dependency graph a "cleanup" pass: remove all transitive relations, remove all duplicated relations. And, possibly, group sequential nodes into a single evaluation node. - Dependency graph evaluation is written in a way that if there is a linear nodes setup like `A -> B -> C` then evaluation happens from a "hot" thread: no re-scheduling is done, the active thread immediately jumps from node A to B to C. In practice this means that for "simple" cases those extra relations are likely to not cause any measurable overhead. - In order to have all transitive relations properly taken into account the make-drivers-singlethread relations must be added at the end of the relations builder, once all other relations are known. This is not a problem from what builder supports (this is already how `build_copy_on_write_relations` is done). Just something to keep in mind when implementing fix for this issue.

This issue was referenced by blender/blender@94e180bd80

This issue was referenced by blender/blender@94e180bd806ab3b21db07afe4962de666b407217

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Resolved' to: 'Confirmed'

Changed status from 'Resolved' to: 'Confirmed'

This issue was referenced by blender/blender@4c30dc3431

This issue was referenced by blender/blender@4c30dc343165a643e2173a055ed2aca3137991a5

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'

Resolved by committing 4c30dc343165a643e2173a055ed2aca3137991a5

Resolved by committing 4c30dc343165a643e2173a055ed2aca3137991a5
Author
Member

I think I can still reproduce this issue, eg. changing the "Dress" property in hailey.blend above. :( Do I re-open?

I think I can still reproduce this issue, eg. changing the "Dress" property in hailey.blend above. :( Do I re-open?

Changed status from 'Resolved' to: 'Confirmed'

Changed status from 'Resolved' to: 'Confirmed'

I can confirm that update is still not always working as expected, also with file from blender/blender#70350. No more warning though, afaict.

I can confirm that update is still not always working as expected, also with file from blender/blender#70350. No more warning though, afaict.

Added subscribers: @bunny, @ConradDueck, @JacquesLucke

Added subscribers: @bunny, @ConradDueck, @JacquesLucke

This issue was referenced by blender/blender@7192a1bca3

This issue was referenced by blender/blender@7192a1bca315485b53214ba8dd5be22d00a10a90

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'
Sign in to join this conversation.
No Milestone
No project
No Assignees
8 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: studio/blender-studio#73593
No description provided.