Race condition during fluid simulation with IK-affected effectors #115636

Open
opened 2023-11-30 22:57:35 +01:00 by Eugene-Kuznetsov · 9 comments
Contributor

I'm observing a crash when trying to perform fluid simulation when some of the effectors are affected by an armature with IK.

The immediate cause of the crash is that two different threads attempt to execute iksolver_execute_tree (blender/source/blender/ikplugin/intern/iksolver_plugin.cc) for the same object and pchan_root, at the same time. They both attempt to allocate and free sub-objects inside pchan_root, resulting in a conflict and a segfault.

Stacks for the two calls look like so:
1.

#0  0x0000000007edf600 in iksolver_execute_tree ()
#1  0x000000000217e8e6 in BKE_pose_where_is(Depsgraph*, Scene*, Object*) ()
#2  0x000000000114aa49 in BKE_object_modifier_update_subframe(Depsgraph*, Scene*, Object*, bool, int, float, int) [clone .localalias] ()
#3  0x000000000114a5ff in BKE_object_modifier_update_subframe(Depsgraph*, Scene*, Object*, bool, int, float, int) [clone .localalias] ()
#4  0x0000000002407276 in update_flowsfluids(Depsgraph*, Scene*, Object*, FluidDomainSettings*, float, float, int, float) ()
#5  0x000000000241aa84 in BKE_fluid_modifier_do ()
#6  0x000000000392380f in fluid_modifier_do_isolated(void*) ()
#7  0x00007ffff7f54a55 in tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long) ()
#8  0x0000000002b591b5 in BLI_task_isolate ()
#0  0x0000000007edf600 in iksolver_execute_tree ()
#1  0x0000000002fb675c in blender::deg::(anonymous namespace)::evaluate_node(blender::deg::(anonymous namespace)::DepsgraphEvalState const*, blender::deg::OperationNode*) [clone .isra.0] ()
#2  0x0000000002fb70f5 in blender::deg::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) ()
#3  0x0000000002b4fac1 in tbb::internal::function_task<Task>::execute() ()

I can work around the issue (very crudely) by slapping a global mutex on the entire body of iksolver_execute_tree(). I'm not sure how to do it cleaner.

I'm observing a crash when trying to perform fluid simulation when some of the effectors are affected by an armature with IK. The immediate cause of the crash is that two different threads attempt to execute iksolver_execute_tree (blender/source/blender/ikplugin/intern/iksolver_plugin.cc) for the same object and pchan_root, at the same time. They both attempt to allocate and free sub-objects inside pchan_root, resulting in a conflict and a segfault. Stacks for the two calls look like so: 1. ``` #0 0x0000000007edf600 in iksolver_execute_tree () #1 0x000000000217e8e6 in BKE_pose_where_is(Depsgraph*, Scene*, Object*) () #2 0x000000000114aa49 in BKE_object_modifier_update_subframe(Depsgraph*, Scene*, Object*, bool, int, float, int) [clone .localalias] () #3 0x000000000114a5ff in BKE_object_modifier_update_subframe(Depsgraph*, Scene*, Object*, bool, int, float, int) [clone .localalias] () #4 0x0000000002407276 in update_flowsfluids(Depsgraph*, Scene*, Object*, FluidDomainSettings*, float, float, int, float) () #5 0x000000000241aa84 in BKE_fluid_modifier_do () #6 0x000000000392380f in fluid_modifier_do_isolated(void*) () #7 0x00007ffff7f54a55 in tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long) () #8 0x0000000002b591b5 in BLI_task_isolate () ``` 2. ``` #0 0x0000000007edf600 in iksolver_execute_tree () #1 0x0000000002fb675c in blender::deg::(anonymous namespace)::evaluate_node(blender::deg::(anonymous namespace)::DepsgraphEvalState const*, blender::deg::OperationNode*) [clone .isra.0] () #2 0x0000000002fb70f5 in blender::deg::(anonymous namespace)::deg_task_run_func(TaskPool*, void*) () #3 0x0000000002b4fac1 in tbb::internal::function_task<Task>::execute() () ``` I can work around the issue (very crudely) by slapping a global mutex on the entire body of iksolver_execute_tree(). I'm not sure how to do it cleaner.
Eugene-Kuznetsov added the
Status
Needs Triage
Priority
Normal
Type
Report
labels 2023-11-30 22:57:36 +01:00

This report does not contain all the requested information, which is required for us to investigate the issue.

Please submit a new report and carefully follow the instructions. Be sure to provide system information, Blender version, the last Blender version which worked, and a .blend file with exact steps to reproduce the problem.

A guideline for making a good bug report can be found at https://wiki.blender.org/wiki/Process/Bug_Reports

This report does not contain all the requested information, which is required for us to investigate the issue. Please submit a new report and carefully follow the instructions. Be sure to provide system information, Blender version, the last Blender version which worked, and a .blend file with exact steps to reproduce the problem. A guideline for making a good bug report can be found at https://wiki.blender.org/wiki/Process/Bug_Reports
Blender Bot added
Status
Archived
and removed
Status
Needs Triage
labels 2023-12-01 09:10:29 +01:00
Author
Contributor

Attaching a blend file that reproduces the issue. Go to "Fluid Domain" -> "Physics" and click on "Bake All". It may take a few tries but it will eventually crash.

I've observed the issue with head of trunk on Linux. I am now checking other tags and operating systems, will update the ticket with results.

Attaching a blend file that reproduces the issue. Go to "Fluid Domain" -> "Physics" and click on "Bake All". It may take a few tries but it will eventually crash. I've observed the issue with head of trunk on Linux. I am now checking other tags and operating systems, will update the ticket with results.
Blender Bot added
Status
Needs Triage
and removed
Status
Archived
labels 2023-12-01 11:35:41 +01:00

Not able to confirm.

Not able to confirm.
Author
Contributor

Try this one

Try this one
Member

Hi, thanks for the report. Unable to confirm either. Entire simulation is baked and played without problem. If threading is involved here, could you verify crash on single thread -t 1?
I'm not sure which blender version you're using. Could you share that? Also, are you building from source? Would be good if you check in release build.

Hi, thanks for the report. Unable to confirm either. Entire simulation is baked and played without problem. If threading is involved here, could you verify crash on single thread `-t 1`? I'm not sure which blender version you're using. Could you share that? Also, are you building from source? Would be good if you check in release build.
Pratik Borhade added
Status
Needs Information from User
and removed
Status
Needs Triage
labels 2023-12-07 11:32:12 +01:00
Author
Contributor

Threading is definitely involved. iksolver_execute_tree is executed twice per frame per IK bone from different threads, and the crash occurs if the two calls for the same bone happen to overlap.

It crashes during baking. Sometimes it takes a while for the right thread alignment to occur. Try "Bake All" -> "Free All" a few times.

I'm building it from source, I've tested it and reproduced the crash with both 4.0.0 and head of trunk. I've also been able to reproduce the crash with the "official" 4.0.2 https://www.blender.org/download/release/Blender4.0/blender-4.0.2-linux-x64.tar.xz/

It does not crash with -t 1.

Threading is definitely involved. iksolver_execute_tree is executed twice per frame per IK bone from different threads, and the crash occurs if the two calls for the same bone happen to overlap. It crashes during baking. Sometimes it takes a while for the right thread alignment to occur. Try "Bake All" -> "Free All" a few times. I'm building it from source, I've tested it and reproduced the crash with both 4.0.0 and head of trunk. I've also been able to reproduce the crash with the "official" 4.0.2 https://www.blender.org/download/release/Blender4.0/blender-4.0.2-linux-x64.tar.xz/ It does not crash with -t 1.
Iliya Katushenock added the
Interest
Animation & Rigging
label 2023-12-07 16:04:37 +01:00
Member

It does not crash with -t 1.

Thanks. AFAIK, there is no active mantaflow/physics developer. Looks like you already have a fix, please feel free to submit in that case.

> It does not crash with -t 1. Thanks. AFAIK, there is no active mantaflow/physics developer. Looks like you already have a fix, please feel free to submit in that case.
Author
Contributor

Perhaps @ChrisLend or @dr.sybren have any thoughts?
My first approach would be to add a mutex that prevents iksolver_execute_tree from being executed simultaneously for the same bPoseChannel object in different threads, the mutex could live in https://projects.blender.org/blender/blender/src/branch/main/source/blender/makesdna/DNA_action_types.h#L222, but I'm unsure how to go about that because bPoseChannel_Runtime is a C rather than C++ structure.

Perhaps @ChrisLend or @dr.sybren have any thoughts? My first approach would be to add a mutex that prevents iksolver_execute_tree from being executed simultaneously for the same bPoseChannel object in different threads, the mutex could live in https://projects.blender.org/blender/blender/src/branch/main/source/blender/makesdna/DNA_action_types.h#L222, but I'm unsure how to go about that because bPoseChannel_Runtime is a C rather than C++ structure.

My first approach would be to add a mutex that prevents iksolver_execute_tree from being executed simultaneously for the same bPoseChannel object in different threads

That doesn't look like the right approach to me. First order of business would be to investigate why there is a threading issue in the first place. It could be as "simple" as a missing relation in the dependency graph. The solution then would be to add that relation, which automatically makes the depsgraph evaluation avoid the parallel execution.

> My first approach would be to add a mutex that prevents iksolver_execute_tree from being executed simultaneously for the same bPoseChannel object in different threads That doesn't look like the right approach to me. First order of business would be to investigate why there is a threading issue in the first place. It could be as "simple" as a missing relation in the dependency graph. The solution then would be to add that relation, which automatically makes the depsgraph evaluation avoid the parallel execution.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#115636
No description provided.