Assert/crash during COW phase of depsgraph evaluation due to mismatch mask info in code updating mesh data from multires subdiv ccg runtime data. #84084

Open
opened 2020-12-23 16:00:27 +01:00 by Bastien Montagne · 6 comments

Found after fixing #84002 (Sculpt: Masking operations crash if multires is in play).

BLI_assert(grid_element.mask != NULL) in multires_reshape_assign_final_coords_from_ccg() will fail after undoing from Sculpt mode to Object mode, since in Object mode mesh data does not have paint mask cdlayers yet (it is added when switching to Sculpt mode), while runtime subdiv_ccg coming from sculpt undo step does.

Will commit a temp workaround to make Blender usable, but not sure at all if this is actually valid/expected situation, or if there are more layers of worms in that giant barrel. Need @Sergey's enlightenment here, both on depsgraph side and multires side of the question.

Steps to reproduce:

  • Default startup cube
  • Add multires, subdivide at least once.
  • Switch to sculpt mode, draw a sculpt stroke.
  • Undo twice to object mode

And it asserts:

1   __GI_raise                                                                                                                                                                                                                                                                                                  raise.c                           50   0x7fae6ac53c81 
2   __GI_abort                                                                                                                                                                                                                                                                                                  abort.c                           79   0x7fae6ac3d537 
3   _BLI_assert_abort                                                                                                                                                                                                                                                                                           BLI_assert.c                      50   0x290e6d84     
4   multires_reshape_assign_final_coords_from_ccg                                                                                                                                                                                                                                                               multires_reshape_ccg.c            78   0x1091bc37     
5   multiresModifier_reshapeFromCCG                                                                                                                                                                                                                                                                             multires_reshape.c                158  0x1091613b     
6   object_update_from_subsurf_ccg                                                                                                                                                                                                                                                                              object.c                          1476 0x109bcc4d     
7   BKE_object_free_derived_caches                                                                                                                                                                                                                                                                              object.c                          1557 0x109bd60d     
8   blender::deg::ObjectRuntimeBackup::restore_to_object                                                                                                                                                                                                                                                        deg_eval_runtime_backup_object.cc 112  0x28d41ddc     
9   blender::deg::RuntimeBackup::restore_to_id                                                                                                                                                                                                                                                                  deg_eval_runtime_backup.cc        97   0x28d3584f     
10  blender::deg::deg_update_copy_on_write_datablock                                                                                                                                                                                                                                                            deg_eval_copy_on_write.cc         954  0x28c8f38f     
11  blender::deg::deg_evaluate_copy_on_write                                                                                                                                                                                                                                                                    deg_eval_copy_on_write.cc         1088 0x28c902bf     
12  std::__invoke_impl<void, void ( *&)(Depsgraph *, blender::deg::IDNode const *), Depsgraph *, blender::deg::IDNode *&>                                                                                                                                                                                       invoke.h                          60   0x28d19f6d     
13  std::__invoke<void ( *&)(Depsgraph *, blender::deg::IDNode const *), Depsgraph *, blender::deg::IDNode *&>                                                                                                                                                                                                  invoke.h                          95   0x28d153cf     
14  std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>::__call<void, Depsgraph *&&, 0ul, 1ul>(std::tuple<Depsgraph *&&>&&, std::_Index_tuple<0ul, 1ul>)                                                                                             functional                        416  0x28d0cea4     
15  std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>::operator()<Depsgraph *, void>(Depsgraph *&&)                                                                                                                                                functional                        499  0x28d05809     
16  std::__invoke_impl<void, std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *>(std::__invoke_other, std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *&&) invoke.h                          60   0x28cfca8c     
17  std::__invoke_r<void, std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *>(std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *&&)                         invoke.h                          110  0x28cf1cca     
18  std::_Function_handler<void (Depsgraph *), std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>>::_M_invoke(std::_Any_data const&, Depsgraph *&&)                                                                                                 std_function.h                    291  0x28ce9b7c     
19  std::function<void (Depsgraph *)>::operator()(Depsgraph *) const                                                                                                                                                                                                                                            std_function.h                    622  0x28e2c749     
20  blender::deg::(anonymous namespace)::evaluate_node                                                                                                                                                                                                                                                          deg_eval.cc                       113  0x28e27e58     
... <More>                                                                                                                                                                                                                                                                                                                                                            
Found after fixing #84002 (Sculpt: Masking operations crash if multires is in play). `BLI_assert(grid_element.mask != NULL)` in `multires_reshape_assign_final_coords_from_ccg()` will fail after undoing from Sculpt mode to Object mode, since in Object mode mesh data does not have paint mask cdlayers yet (it is added when switching to Sculpt mode), while runtime `subdiv_ccg` coming from sculpt undo step does. Will commit a temp workaround to make Blender usable, but not sure at all if this is actually valid/expected situation, or if there are more layers of worms in that giant barrel. Need @Sergey's enlightenment here, both on depsgraph side and multires side of the question. Steps to reproduce: - Default startup cube - Add multires, subdivide at least once. - Switch to sculpt mode, draw a sculpt stroke. - Undo twice to object mode And it asserts: ``` 1 __GI_raise raise.c 50 0x7fae6ac53c81 2 __GI_abort abort.c 79 0x7fae6ac3d537 3 _BLI_assert_abort BLI_assert.c 50 0x290e6d84 4 multires_reshape_assign_final_coords_from_ccg multires_reshape_ccg.c 78 0x1091bc37 5 multiresModifier_reshapeFromCCG multires_reshape.c 158 0x1091613b 6 object_update_from_subsurf_ccg object.c 1476 0x109bcc4d 7 BKE_object_free_derived_caches object.c 1557 0x109bd60d 8 blender::deg::ObjectRuntimeBackup::restore_to_object deg_eval_runtime_backup_object.cc 112 0x28d41ddc 9 blender::deg::RuntimeBackup::restore_to_id deg_eval_runtime_backup.cc 97 0x28d3584f 10 blender::deg::deg_update_copy_on_write_datablock deg_eval_copy_on_write.cc 954 0x28c8f38f 11 blender::deg::deg_evaluate_copy_on_write deg_eval_copy_on_write.cc 1088 0x28c902bf 12 std::__invoke_impl<void, void ( *&)(Depsgraph *, blender::deg::IDNode const *), Depsgraph *, blender::deg::IDNode *&> invoke.h 60 0x28d19f6d 13 std::__invoke<void ( *&)(Depsgraph *, blender::deg::IDNode const *), Depsgraph *, blender::deg::IDNode *&> invoke.h 95 0x28d153cf 14 std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>::__call<void, Depsgraph *&&, 0ul, 1ul>(std::tuple<Depsgraph *&&>&&, std::_Index_tuple<0ul, 1ul>) functional 416 0x28d0cea4 15 std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>::operator()<Depsgraph *, void>(Depsgraph *&&) functional 499 0x28d05809 16 std::__invoke_impl<void, std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *>(std::__invoke_other, std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *&&) invoke.h 60 0x28cfca8c 17 std::__invoke_r<void, std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *>(std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>&, Depsgraph *&&) invoke.h 110 0x28cf1cca 18 std::_Function_handler<void (Depsgraph *), std::_Bind<void ( *(std::_Placeholder<1>, blender::deg::IDNode *))(Depsgraph *, blender::deg::IDNode const *)>>::_M_invoke(std::_Any_data const&, Depsgraph *&&) std_function.h 291 0x28ce9b7c 19 std::function<void (Depsgraph *)>::operator()(Depsgraph *) const std_function.h 622 0x28e2c749 20 blender::deg::(anonymous namespace)::evaluate_node deg_eval.cc 113 0x28e27e58 ... <More> ```
Sergey Sharybin was assigned by Bastien Montagne 2020-12-23 16:00:27 +01:00
Author
Owner

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'
Author
Owner

Added subscribers: @Sergey, @mont29

Added subscribers: @Sergey, @mont29

Based on the fact that reasonable amount of time was spent on looking into this issue and into the original report, few thinks which will make development process more efficient:

  • Always explicitly include steps to reproduce the issue. Including steps related on modifying the code to trigger the assert/crash again.
  • Include all findings gathered during investigation. For example, why/where is it exactly something went out of sync.

Doing this will avoid doing a duplicate work on digging into roots of the issue, and often will even help seeing other usecases which might be relevant but not immediately clear for one developer.

I am confused about priority of the report. Ignoring the state of the code, there is no crash/assert. If there is still something crashes for people, the information about it is to be added to the description. If it's purely code-keeping, then I do not think it should be considered a high priority.

Another confusing aspect, is why a workaround is done in the first place. If this is a regression caused by expected changes somewhere else this information is essential to be known. If it's not a recent regression, then it is much better to not hide code which catches invalid usages.


Now, to the actual solution here.

To me the root cause seems to be sculpt mode abusing the undo system:

  • It (sculpt mode) does has local undo stack.
  • Entering sculpt mode can modify object data (via BKE_sculpt_mask_layers_ensure), without creating undo step for the modification.
  • Sculpt undo node always marks SubdivCCG as modified, even when it is returned to a non-modified state.
  • Sculpt mode never updates SubdivCCG other than via DEG_ID_RECALC_GEOMETRY
  • As a result of 3 above, leaving sculpt mode using undo will have SubdivCCG un-intuitively marked as modified, containing both coordinate and mask data, and will try to update object data with modifications from SubdivCCG.

The proper fix would be the sculpt mode to be more sane about the undo stack, and ensure state is always consistent rather than relying on fragile workarounds in other areas of code. This is something I would expect the sculpt team to take care of.

Since the proper fix is likely to take some time, i would consider the committed workaround good enough for the time being. I would, however, update the comment to state actual root of the problem, something like this P1856.

This, of course, is only applicable if there are no known user-level issues with the workaround you've committed.

Based on the fact that reasonable amount of time was spent on looking into this issue and into the original report, few thinks which will make development process more efficient: * Always explicitly include steps to reproduce the issue. Including steps related on modifying the code to trigger the assert/crash again. * Include all findings gathered during investigation. For example, why/where is it exactly something went out of sync. Doing this will avoid doing a duplicate work on digging into roots of the issue, and often will even help seeing other usecases which might be relevant but not immediately clear for one developer. I am confused about priority of the report. Ignoring the state of the code, there is no crash/assert. If there is still something crashes for people, the information about it is to be added to the description. If it's purely code-keeping, then I do not think it should be considered a high priority. Another confusing aspect, is why a workaround is done in the first place. If this is a regression caused by expected changes somewhere else this information is essential to be known. If it's not a recent regression, then it is much better to not hide code which catches invalid usages. --- Now, to the actual solution here. To me the root cause seems to be sculpt mode abusing the undo system: - It (sculpt mode) does has local undo stack. - Entering sculpt mode can modify object data (via `BKE_sculpt_mask_layers_ensure`), without creating undo step for the modification. - Sculpt undo node always marks `SubdivCCG` as modified, even when it is returned to a non-modified state. - Sculpt mode never updates `SubdivCCG` other than via `DEG_ID_RECALC_GEOMETRY` - As a result of 3 above, leaving sculpt mode using undo will have `SubdivCCG` un-intuitively marked as modified, containing both coordinate and mask data, and will try to update object data with modifications from `SubdivCCG`. The proper fix would be the sculpt mode to be more sane about the undo stack, and ensure state is always consistent rather than relying on fragile workarounds in other areas of code. This is something I would expect the sculpt team to take care of. Since the proper fix is likely to take some time, i would consider the committed workaround good enough for the time being. I would, however, update the comment to state actual root of the problem, something like this [P1856](https://archive.blender.org/developer/P1856.txt). This, of course, is only applicable if there are no known user-level issues with the workaround you've committed.
Author
Owner

Committed suggested comment changes.

Also from @Sergey's comments this could actually be caused by first Sculpt undo step not being properly undone (which is one of the issues to be addressed by #83806 (Undo - Current Status and Important Fixes Needed)).

So might be worth waiting for #83806 to be addressed first, before checking again on this issue.

Committed suggested comment changes. Also from @Sergey's comments this could actually be caused by first Sculpt undo step not being properly undone (which is one of the issues to be addressed by #83806 (Undo - Current Status and Important Fixes Needed)). So might be worth waiting for #83806 to be addressed first, before checking again on this issue.
Bastien Montagne changed title from Assert/crash during COW phase of depsgraph evaluation due to mismtach mask info in code updating mesh data from multires subdiv ccg runtime data. to Assert/crash during COW phase of depsgraph evaluation due to mismatch mask info in code updating mesh data from multires subdiv ccg runtime data. 2020-12-28 14:25:36 +01:00

@mont29, Seems like this is a task assigned assigned to me, but I'm not sure how can I help here. Is there anything particular you expect me to take care of here?

@mont29, Seems like this is a task assigned assigned to me, but I'm not sure how can I help here. Is there anything particular you expect me to take care of here?
Sergey Sharybin was unassigned by Bastien Montagne 2021-01-11 16:37:34 +01:00
Author
Owner

Woops no sorry, was a left-over from initial own understanding of the issue. no reason for this to be assigned to you anymore indeed.

Woops no sorry, was a left-over from initial own understanding of the issue. no reason for this to be assigned to you anymore indeed.
Philipp Oeser removed the
Interest
Modeling
label 2023-02-09 15:28:45 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#84084
No description provided.