Metal: Resolve race condition in memory manager #105254
No reviewers
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#105254
Loading…
Reference in New Issue
No description provided.
Delete Branch "Jason-Fielder/blender:MetalSafeFreeList_Fix_Rel3_5"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fix race condition if several competing threads are inserting Metal
buffers into the MTLSafeFreeList simultaneously while a new list
chunk is being created.
Also raise the limit for an MTLSafeFreeListChunk size to optimize
for interactivity when releasing lots of memory simultaneously.
Authored by Apple: Michael Parkin-White
Related to #96261
The usage of conditional variable seems a bit strange. If it is intended way of using it, a comment is needed explaining why is it valid.
From just reading the code it is not obious how the conditional variable ever gets woken up.
@ -419,0 +430,4 @@
* so if the resource is not ready, we can hold until it is available. */
std::condition_variable_any wait_cond;
std::unique_lock<std::recursive_mutex> cond_lock(lock_);
wait_cond.wait(cond_lock, [&] { return (next_list = next_.load()) != nullptr; });
Not really sure how this can ever finish: there is no notification sent to the conditional variable.
Thanks for highlighting this, this was likely working due to an oversight as it would appear
wait(..)
can release early in unintended situations, but as the condition is satisfied when this happens, execution will carry on.Though yes, I'll look at this in more detail and refactor with the explicit notification once the condition is satisfied. However, I am certainly open to a more appropriate, different, approach if needed, as this setup does feel like it may add more complexity than is needed for this particular case.
While not at all neat, the original conditional lock, to prevent the contendin threads from spinning:
The canonical way of doing such synchronization would involve the following simple steps from the current state of code:
std::condition_variable_any wait_cond;
definition next to thelock_
, so that the same conditional variable is available from all threads.if (has_created_next_chunk == 0) {
branch and after thelock_.unlock();
you dowait_cond_.notify_all()
.You can see example from the (the
condition_variable_any
is essentially the same ascondition_variable
, just supports more types of mutexes):https://en.cppreference.com/w/cpp/thread/condition_variable
Thanks for the info on this, that's helpful!
it would appear that a caveat with using
conditional_variable
or synchronization primitives following this style is that notify_all only applies to threads which are actively stalled within wait at the time this is called, whereas our desired intent would be to stop any threads waiting if this condition is already set.The notification could result in a situation where the notify happens before all threads have reached the wait. If a thread hits the wait after, then this will still stall and be subject to the same issues.
As for this particular case, the object being waited on is only created once, then I'll give a solution using a
std::future
a try, as this feels more appropriate for this particular case.Ok, I am slowly getting into understanding the actual algorithm and structure used here (before was just looking from the synchronization primitives usage point of view).
I am not really sure why do we need anything more than just a double-checked lock here. The amount of waits seems to be the same as with the any proposed change here, but the code is much easier on many levels:
No need to introduce new primitives, and you can get rid of
has_next_pool_
.P.S. The
lock_
should actually be calledmutex_
.Thanks for the feedback! Yeah this looks far better, had definitely started overcomplicating things once going down the rabbit hole.
This version more or less results in the same locking pattern as the first version it seems, as it still locked if next_list wasn't available and would wait for the thread that had already entered the block.
But I agree that this is a cleaner approach. Will submit with this proposed change, and remove the other has_next_pool_ counter and refactor other parts of the code which used this.
I will also change the clamp on current_list_index_ to clamp against INTMAX. Only need to clamp in the very very rare case that this overflows, as it needs to be >= MAX_NUM_BUFFERS_. However, having the actual index counter in the root list would also provide the useful bit of data on the total number of buffers in the entire chunked list.
Thanks for the update. Looks much cleaner and easier to understand.
I think there are couple of include statements which are not needed anymore.
Also, did you consider updating comment for the
insert_buffer
in the header? It statesPerforms a lockless list insert.
. Not sure what is the important point for the API here. Maybe something likeCan be used from multiple threads. Performs insertion with the least amount of threading synchronization
?@ -15,6 +15,8 @@
#include <Metal/Metal.h>
#include <QuartzCore/QuartzCore.h>
#include <future>
Is this include still needed for anything?
@ -8,6 +8,8 @@
#include "mtl_debug.hh"
#include "mtl_memory.hh"
#include <condition_variable>
Think this is also not needed anymore.
Metal: Resolve race condition in memory manager.to Metal: Resolve race condition in memory manager