Regression: Crash when user opens directory instead of file on USD import #105160

Closed
opened 2023-02-24 02:41:29 +01:00 by Matt McLin · 9 comments
Member

System Information
Operating system: macOS 13.3
Graphics card: M1 Max

Blender Version
Broken: 3.5, 1e6ed77896, main, 2023-02-13, debug build
Worked: 3.5, 3.4.x release builds

Short description of error
I am consistently hitting a crash in Blender on macOS if I (originally accidentally, and now intentionally) misuse the USD import dialog by attempting to import a directory instead of a .usd file. However, it seems I am only able to reproduce the issue with debug builds of Blender.

Exact steps for others to reproduce the error

  • start Blender
  • File->Import, Universal Scene Description
  • browse to any directory
  • click the Import USD button (without a file selected)

Analysis of bug
There is a specific sequence of events that appears to inevitably and logically lead to the crash:

  1. At conclusion of the failed USD import, the main thread is calling usd::import_endjob() from wm_job_end(), which is being executed in the context of iterating through a linked list of timers in wm->timers in wm_window_timer(), via LISTBASE_FOREACH_MUTABLE(wmTimer *, wt, &wm->timers). As part of this linked list iteration, it stores the next value to evaluate in wt_iter_next.

  2. Inside usd::import_endjob() we call WM_report() to report a helpful error message to the user.

  3. As part of WM_report_banner_show(), the report timer is reset and removed.

  4. WM_event_remove_timer() calls BLI_remlink() to remove the report timer from wm->timers.

  5. The timer that is removed is in fact the same timer that was already assigned to wt_iter_next in step 1.

  6. The memory for that timer is deallocated via MEM_freeN(wt);

  7. Very soon, still in the same main thread, we arrive back at the LISTBASE_FOREACH_MUTABLE loop, where the now deallocated wt_iter_next has an invalid value and is assigned as the next wt to evaluate. This macro attempts to dereference wt->next, which causes an access violation, and Blender crashes.

It is not yet clear to me how the release builds seem to get away with this bug without crashing, but my expectation is that this is likely due to the debug-build deallocator overwriting memory that the release-build deallocator does not touch. In that case the release build is just getting lucky.

See attached screenshot of crash. See also attached screenshot showing the problematic bit of code just prior to the crash.

**System Information** Operating system: macOS 13.3 Graphics card: M1 Max **Blender Version** Broken: 3.5, 1e6ed778969, main, 2023-02-13, debug build Worked: 3.5, 3.4.x release builds **Short description of error** I am consistently hitting a crash in Blender on macOS if I (originally accidentally, and now intentionally) misuse the USD import dialog by attempting to import a directory instead of a .usd file. However, it seems I am only able to reproduce the issue with debug builds of Blender. **Exact steps for others to reproduce the error** - start Blender - File->Import, Universal Scene Description - browse to any directory - click the Import USD button (without a file selected) **Analysis of bug** There is a specific sequence of events that appears to inevitably and logically lead to the crash: 1. At conclusion of the failed USD import, the main thread is calling `usd::import_endjob()` from `wm_job_end()`, which is being executed in the context of iterating through a linked list of timers in `wm->timers` in `wm_window_timer()`, via `LISTBASE_FOREACH_MUTABLE(wmTimer *, wt, &wm->timers)`. As part of this linked list iteration, it stores the next value to evaluate in `wt_iter_next`. 2. Inside `usd::import_endjob()` we call `WM_report()` to report a helpful error message to the user. 3. As part of `WM_report_banner_show()`, the report timer is reset and removed. 4. `WM_event_remove_timer()` calls `BLI_remlink()` to remove the report timer from `wm->timers`. 5. The timer that is removed is in fact the same timer that was already assigned to `wt_iter_next` in step 1. 6. The memory for that timer is deallocated via `MEM_freeN(wt);` 7. Very soon, still in the same main thread, we arrive back at the `LISTBASE_FOREACH_MUTABLE` loop, where the now deallocated `wt_iter_next` has an invalid value and is assigned as the next `wt` to evaluate. This macro attempts to dereference `wt->next`, which causes an access violation, and Blender crashes. It is not yet clear to me how the release builds seem to get away with this bug without crashing, but my expectation is that this is likely due to the debug-build deallocator overwriting memory that the release-build deallocator does not touch. In that case the release build is just getting lucky. See attached screenshot of crash. See also attached screenshot showing the problematic bit of code just prior to the crash.
Matt McLin added the
Severity
Normal
Status
Needs Triage
Type
Report
labels 2023-02-24 02:41:29 +01:00
Iliya Katushenock added the
Interest
USD
label 2023-02-24 02:44:59 +01:00

Thanks for the report and analysis, I can confirm the regression.
Something that might also help is detecting which commit introduced this problem.
I'll try to bisect.

Thanks for the report and analysis, I can confirm the regression. Something that might also help is detecting which commit introduced this problem. I'll try to bisect.
Germano Cavalcante changed title from Crash when user opens directory instead of file on USD import to Regression: Crash when user opens directory instead of file on USD import 2023-02-24 14:55:53 +01:00

This seems to be one of the cases that I can't replicate in a local build. So I can't bisect.

This seems to be one of the cases that I can't replicate in a local build. So I can't bisect.
Bastien Montagne added
Type
Bug
and removed
Type
Report
labels 2023-02-24 18:35:02 +01:00
Bastien Montagne added
Module
Core
Interest
Pipeline, Assets & IO
and removed
Module
Pipeline, Assets & IO
labels 2023-02-24 18:37:53 +01:00

Moving to Core module, since this seems to be an issue with the WM Jobs handling itself.

Moving to Core module, since this seems to be an issue with the WM Jobs handling itself.

This issue is already fairly bad, but it also raises another one: the whole WM_report API is called from the non-main worker thread, when it's absolutely not threadsafe at all.

In general, unless explicitly documented as such, the WM APIs should not be considered threadsafe, and therefore only ever called from the main thread.

Proper fix for both issues I think is to have own ReportList as part of the job's data, and directly using BKE_report API to populate it (with proper locking if needed).

Then calls to WM_report & co should only ever happen from the main thread, and outside of any wmTimer handling codepath, i.e. once the job itself is finished.

If we want to generate user-visible reports while jobs are running, then I think the wmJobs system needs some update (something like each job having their own ReportList, which can then be processed by the WM management code in a safe way, i.e. by accumulating all reports somewhere, and showing them in the UI outside of the Timers handling loop).

This issue is already fairly bad, but it also raises another one: the whole `WM_report` API is called from the non-main worker thread, when it's absolutely not threadsafe at all. In general, unless explicitly documented as such, the WM APIs should not be considered threadsafe, and therefore only ever called from the main thread. Proper fix for both issues I think is to have own `ReportList` as part of the job's data, and directly using BKE_report API to populate it (with proper locking if needed). Then calls to `WM_report` & co should only ever happen from the main thread, and outside of any wmTimer handling codepath, i.e. once the job itself is finished. If we want to generate user-visible reports while jobs are running, then I think the wmJobs system needs some update (something like each job having their own `ReportList`, which can then be processed by the WM management code in a safe way, i.e. by accumulating all reports somewhere, and showing them in the UI outside of the Timers handling loop).

(I will update the documentation regarding usage of WM_report API)

(I will update the documentation regarding usage of `WM_report` API)

Looking at existing code, a lot of other jobs are using the same callings to WM_report in their code... Wonder how such an issue has remained unknown for so long :|

Looking at existing code, a lot of other jobs are using the same callings to `WM_report` in their code... Wonder how such an issue has remained unknown for so long :|
Bastien Montagne added this to the 3.5 milestone 2023-02-28 18:39:49 +01:00

I would consider this issue as a blocking one for 3.5 release.

Will create a design task asap regarding how to fix this set of entangled issues (it involves at least wmTimer and WM_report handling).

I would consider this issue as a blocking one for 3.5 release. Will create a design task asap regarding how to fix this set of entangled issues (it involves at least `wmTimer` and `WM_report` handling).

NOTE: Even though the underlying issue leading to the crash is going to be fixed soon, think this kind of error (invalid path) should also be detected by the operator itself, instead of starting a job and immediately failing...

NOTE: Even though the underlying issue leading to the crash is going to be fixed soon, think this kind of error (invalid path) should also be detected by the operator itself, instead of starting a job and immediately failing...

This issue is fixed by d66672e17a.

I have created another report for the USD project, to not start import job with 'obviously' invalid input data (like invalid file path and such, #105404).

This issue is fixed by d66672e17a. I have created another report for the USD project, to not start import job with 'obviously' invalid input data (like invalid file path and such, #105404).
Blender Bot added
Status
Archived
and removed
Status
Confirmed
labels 2023-03-03 15:42:05 +01:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#105160
No description provided.