Regression: Crash when user opens directory instead of file on USD import #105160
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#105160
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
System Information
Operating system: macOS 13.3
Graphics card: M1 Max
Blender Version
Broken: 3.5,
1e6ed77896
, main, 2023-02-13, debug buildWorked: 3.5, 3.4.x release builds
Short description of error
I am consistently hitting a crash in Blender on macOS if I (originally accidentally, and now intentionally) misuse the USD import dialog by attempting to import a directory instead of a .usd file. However, it seems I am only able to reproduce the issue with debug builds of Blender.
Exact steps for others to reproduce the error
Analysis of bug
There is a specific sequence of events that appears to inevitably and logically lead to the crash:
At conclusion of the failed USD import, the main thread is calling
usd::import_endjob()
fromwm_job_end()
, which is being executed in the context of iterating through a linked list of timers inwm->timers
inwm_window_timer()
, viaLISTBASE_FOREACH_MUTABLE(wmTimer *, wt, &wm->timers)
. As part of this linked list iteration, it stores the next value to evaluate inwt_iter_next
.Inside
usd::import_endjob()
we callWM_report()
to report a helpful error message to the user.As part of
WM_report_banner_show()
, the report timer is reset and removed.WM_event_remove_timer()
callsBLI_remlink()
to remove the report timer fromwm->timers
.The timer that is removed is in fact the same timer that was already assigned to
wt_iter_next
in step 1.The memory for that timer is deallocated via
MEM_freeN(wt);
Very soon, still in the same main thread, we arrive back at the
LISTBASE_FOREACH_MUTABLE
loop, where the now deallocatedwt_iter_next
has an invalid value and is assigned as the nextwt
to evaluate. This macro attempts to dereferencewt->next
, which causes an access violation, and Blender crashes.It is not yet clear to me how the release builds seem to get away with this bug without crashing, but my expectation is that this is likely due to the debug-build deallocator overwriting memory that the release-build deallocator does not touch. In that case the release build is just getting lucky.
See attached screenshot of crash. See also attached screenshot showing the problematic bit of code just prior to the crash.
Thanks for the report and analysis, I can confirm the regression.
Something that might also help is detecting which commit introduced this problem.
I'll try to bisect.
Crash when user opens directory instead of file on USD importto Regression: Crash when user opens directory instead of file on USD importThis seems to be one of the cases that I can't replicate in a local build. So I can't bisect.
Moving to Core module, since this seems to be an issue with the WM Jobs handling itself.
This issue is already fairly bad, but it also raises another one: the whole
WM_report
API is called from the non-main worker thread, when it's absolutely not threadsafe at all.In general, unless explicitly documented as such, the WM APIs should not be considered threadsafe, and therefore only ever called from the main thread.
Proper fix for both issues I think is to have own
ReportList
as part of the job's data, and directly using BKE_report API to populate it (with proper locking if needed).Then calls to
WM_report
& co should only ever happen from the main thread, and outside of any wmTimer handling codepath, i.e. once the job itself is finished.If we want to generate user-visible reports while jobs are running, then I think the wmJobs system needs some update (something like each job having their own
ReportList
, which can then be processed by the WM management code in a safe way, i.e. by accumulating all reports somewhere, and showing them in the UI outside of the Timers handling loop).(I will update the documentation regarding usage of
WM_report
API)Looking at existing code, a lot of other jobs are using the same callings to
WM_report
in their code... Wonder how such an issue has remained unknown for so long :|I would consider this issue as a blocking one for 3.5 release.
Will create a design task asap regarding how to fix this set of entangled issues (it involves at least
wmTimer
andWM_report
handling).NOTE: Even though the underlying issue leading to the crash is going to be fixed soon, think this kind of error (invalid path) should also be detected by the operator itself, instead of starting a job and immediately failing...
This issue is fixed by
d66672e17a
.I have created another report for the USD project, to not start import job with 'obviously' invalid input data (like invalid file path and such, #105404).