Fix #101877, rigidbodies & constraints causing frequent crashes. #108399

Open
himisa wants to merge 5 commits from himisa/blender:fix_rigid_crash into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Contributor

Problem: See #101877 .

Rigidbody and constraint memory management is very likely to cause dereferencing of invalid (freed) pointers and crash.

Senario 1:

  • If rigidbody object A and B is constrained by C, press space to play animation, then delete one of the objects A/B, the btTypedConstraint C will still remember the pointer of both A and B, and tries to constrain them when animation is continued, but one of the objects is now freed, then blender crashes.

  • Steps to reproduce: 1. open test1a.blend; 2. press Space to play animation; 3. press Space again to pause; 4. delete one of the two cubes; 5. press Space to continue animation, and wait Blender to crash. Note: If you are using release version of Blender, there're chances that freed memory are still accessible and the memory content is still valid, so blender may survive this. You may need to try multiple times to produce a crash.

  • This is fixed by removing all the constraints related to a rigidbody when rigidbody is deleted (see change in btDiscreteDynamicsWorld.cpp).

Senario 2:

  • When performing undo after animation is played, blender will free all rigid bodies & constraints (before rebuilding them to a previous state). If a constraint is freed before its related rigid bodies, the rigid bodies will remember the freed pointer to the constraint. This will cause crash when freeing rigid bodies afterwards, since the code tries to access the (already freed) constraint to remove it from m_constraintRefs of the rigid body.

  • Steps to reproduce: 1. open test1a.blend; 2. press Space to play animation; 3. press Space again to pause; 4. delete one of the two cubes; 5. press ctrl+Z twice, and wait Blender to crash. This is because after first undo, the deleted cube is back at the end of object list, now its guaranteed to be deleted after the constraint object when traversing the object list to delete them.

  • This is fixed by removing the constraint from m_constraintRefs of its related rigid bodies when constraint is getting freed (see change in rb_bullet_api.cpp).

Notice:

Note that we need a new flag (m_isValid ) to remember whether the m_rbA and m_rbB is pointing to valid objects or deleted objects, since when a rigid body is deleted, its related constraint will be removed from btDiscreteDynamicsWorld, but not deleted (owned by the constraint objects, so no delete here). When constraint object wants to remove the constraint again, it should have ability to tell whether m_rbA and m_rbB are still present.

I don't know will it have side effects (in serializing/undo/...) to add an bool m_isValid; member to btTypedConstraint. It will not change sizeof(btTypedConstraint), because of struct padding. Invalidated constraints will also not be serialized, since it will be removed from m_constraints of btDiscreteDynamicsWorld. If there're any other effects that I didn't recognize, please let me know. Tests and comments are welcome.

This should also fix #91369, #91959, #92915, #95603, #100613.

Problem: See #101877 . Rigidbody and constraint memory management is very likely to cause dereferencing of invalid (freed) pointers and crash. Senario 1: - If rigidbody object A and B is constrained by C, press space to play animation, then delete one of the objects A/B, the btTypedConstraint C will still remember the pointer of both A and B, and tries to constrain them when animation is continued, but one of the objects is now freed, then blender crashes. - Steps to reproduce: 1. open test1a.blend; 2. press Space to play animation; 3. press Space again to pause; 4. delete one of the two cubes; 5. press Space to continue animation, and wait Blender to crash. Note: If you are using release version of Blender, there're chances that freed memory are still accessible and the memory content is still valid, so blender may survive this. You may need to try multiple times to produce a crash. - This is fixed by removing all the constraints related to a rigidbody when rigidbody is deleted (see change in btDiscreteDynamicsWorld.cpp). Senario 2: - When performing undo after animation is played, blender will free all rigid bodies & constraints (before rebuilding them to a previous state). If a constraint is freed before its related rigid bodies, the rigid bodies will remember the freed pointer to the constraint. This will cause crash when freeing rigid bodies afterwards, since the code tries to access the (already freed) constraint to remove it from m_constraintRefs of the rigid body. - Steps to reproduce: 1. open test1a.blend; 2. press Space to play animation; 3. press Space again to pause; 4. delete one of the two cubes; 5. press ctrl+Z twice, and wait Blender to crash. This is because after first undo, the deleted cube is back at the end of object list, now its guaranteed to be deleted after the constraint object when traversing the object list to delete them. - This is fixed by removing the constraint from m_constraintRefs of its related rigid bodies when constraint is getting freed (see change in rb_bullet_api.cpp). Notice: Note that we need a new flag (`m_isValid` ) to remember whether the `m_rbA` and `m_rbB` is pointing to valid objects or deleted objects, since when a rigid body is deleted, its related constraint will be removed from `btDiscreteDynamicsWorld`, but not deleted (owned by the constraint objects, so no delete here). When constraint object wants to remove the constraint again, it should have ability to tell whether `m_rbA` and `m_rbB` are still present. I don't know will it have side effects (in serializing/undo/...) to add an `bool m_isValid;` member to `btTypedConstraint`. It will not change `sizeof(btTypedConstraint)`, because of struct padding. Invalidated constraints will also not be serialized, since it will be removed from `m_constraints` of `btDiscreteDynamicsWorld`. If there're any other effects that I didn't recognize, please let me know. Tests and comments are welcome. This should also fix #91369, #91959, #92915, #95603, #100613.
himisa added 3 commits 2023-05-29 21:01:03 +02:00
himisa changed title from WIP: Trying to fix #101877 to WIP: Trying to fix #101877, rigidbody & constraints causing frequent crashes 2023-05-29 22:19:43 +02:00
himisa added 1 commit 2023-05-30 08:53:15 +02:00
himisa added 1 commit 2023-05-30 08:55:40 +02:00
himisa changed title from WIP: Trying to fix #101877, rigidbody & constraints causing frequent crashes to Fix #101877, rigidbodies & constraints causing frequent crashes. 2023-05-30 09:03:44 +02:00
himisa requested review from Sonny Campbell 2023-06-03 11:34:44 +02:00
himisa requested review from YimingWu 2023-08-16 10:58:21 +02:00
Member

Hi @himisa , did you added me to the reviewer due to #109108? I'm actually not that familiar with the physics module, but I can test this patch.

Hi @himisa , did you added me to the reviewer due to #109108? I'm actually not that familiar with the physics module, but I can test this patch.
Author
Contributor

@ChengduLittleA

Thanks!

I browsed recent issues and feel like you have handled some rigidbody related issues and are recently active... I am amature and not familier with blender developers. I hope someone can test this or notify the relevant member.

This issue was getting in my way when I was adjusting my character with both clothes and rigid bodies. Adjusting clothes requires frequent undo and playback, and rigidbody crashes blender a lot.

I compiled a version of blender 3.6 with this fix and it worked fine for me now, so there's no urgency of merging this PR. However I still hope this can be fixed in 4.0 release, if possible.

@ChengduLittleA Thanks! I browsed recent issues and feel like you have handled some rigidbody related issues and are recently active... I am amature and not familier with blender developers. I hope someone can test this or notify the relevant member. This issue was getting in my way when I was adjusting my character with both clothes and rigid bodies. Adjusting clothes requires frequent undo and playback, and rigidbody crashes blender a lot. I compiled a version of blender 3.6 with this fix and it worked fine for me now, so there's no urgency of merging this PR. However I still hope this can be fixed in 4.0 release, if possible.
Iliya Katushenock added the
Module
Animation & Rigging
Interest
Nodes & Physics
labels 2023-08-16 17:21:58 +02:00
Member

Ah I get it. You did quite a lot of work on getting it to work tho :)

@SonnyCampbell_Unity could you check if the implementation of PR is satisfies the design expectation in #101877 and is complete?

Ah I get it. You did quite a lot of work on getting it to work tho :) @SonnyCampbell_Unity could you check if the implementation of PR is satisfies the design expectation in #101877 and is complete?
Member

Hi, thanks for the PR. AFAICT, it indeed fixes the crash reported in #116013
I'm not sure who is working on Physics. @ZedDB ?

Hi, thanks for the PR. AFAICT, it indeed fixes the crash reported in #116013 I'm not sure who is working on Physics. @ZedDB ?

Thanks for bringing this to my attention, I had no clue that you had worked on this @himisa !
I really appreciate it!

I haven't had time to test it yet, but from looking at the code, I think this should be fine.
One thing you need to do though is to add this patch to extern/bullet2/patches so it is obvious that we have applied further changes from the upstream bullet version.

Have you tried running this code with ASAN enabled?
One of the problems we ran into when Sonny and I tried to fix this in the past was that we would get memory leaks. So we held off fixing the crashes as those fixes would instead introduce other issues.

Thanks for bringing this to my attention, I had no clue that you had worked on this @himisa ! I really appreciate it! I haven't had time to test it yet, but from looking at the code, I think this should be fine. One thing you need to do though is to add this patch to extern/bullet2/patches so it is obvious that we have applied further changes from the upstream bullet version. Have you tried running this code with ASAN enabled? One of the problems we ran into when Sonny and I tried to fix this in the past was that we would get memory leaks. So we held off fixing the crashes as those fixes would instead introduce other issues.
Author
Contributor

@ZedDB

I am an amature blender user and not familiar with development. I don't know how to enable ASAN, maybe you can test this when you have time?

Also I generated a .patch file using git -diff and manually removed some diff that modifying intern codes. Is this patch file OK?

@ZedDB I am an amature blender user and not familiar with development. I don't know how to enable ASAN, maybe you can test this when you have time? Also I generated a .patch file using `git -diff` and manually removed some diff that modifying `intern` codes. Is this patch file OK?

Also I generated a .patch file using git -diff and manually removed some diff that modifying intern codes. Is this patch file OK?

The attached patch is what I had in mind, yes.
Include it when then rest of the code has been approved so you don't need to continuously update it when you make any changes.

I don't know how to enable ASAN

If you are on linux, then you should only have to enable the WITH_COMPILER_ASAN cmake option.
Then when you run blender it will output places where or if the program leaked.

I will probably only have time to test this myself next year. So if you can, it would be good if you can test it as well.

Also, feel free to poke me if you haven't heard from me in a while. Thank you for sticking around for this long, I really appreciate it. :)

> Also I generated a .patch file using git -diff and manually removed some diff that modifying intern codes. Is this patch file OK? The attached patch is what I had in mind, yes. Include it when then rest of the code has been approved so you don't need to continuously update it when you make any changes. > I don't know how to enable ASAN If you are on linux, then you should only have to enable the `WITH_COMPILER_ASAN` cmake option. Then when you run blender it will output places where or if the program leaked. I will probably only have time to test this myself next year. So if you can, it would be good if you can test it as well. Also, feel free to poke me if you haven't heard from me in a while. Thank you for sticking around for this long, I really appreciate it. :)
Sebastian Parborg requested review from Sebastian Parborg 2024-02-23 14:37:18 +01:00
Merge conflict checking is in progress. Try again in few moments.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u fix_rigid_crash:himisa-fix_rigid_crash
git checkout himisa-fix_rigid_crash
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#108399
No description provided.