Vulkan backend deadlocks on startup (Debian Testing + Nvidia GPU) #129160

Open
opened 2024-10-17 14:38:49 +02:00 by Bastien Montagne · 3 comments

System Information
Operating system: Debian Testing
Graphics card: Nvidia RTX A4500 (drivers 535.183).

Blender Version
Broken: main

Short description of error

Blender will deadlock when started with the Vulkan backend. UI shows up, but is totally unresponsive (only a few main loops happen).

Exact steps for others to reproduce the error

From debugger it seems that the code is locked on the infinite loop in VKCommandBufferWrapper::wait_for_cpu_synchronization:

1  poll                                                                                                               0x7f33f600a11f 
2  ??                                                                                                                 0x7f33927157f3 
3  ??                                                                                                                 0x7f3392b243b9 
4  ??                                                                                                                 0x7f3392a54b40 
5  blender::gpu::render_graph::VKCommandBufferWrapper::wait_for_cpu_synchronization vk_command_buffer_wrapper.cc 103  0x8315d50      
6  blender::gpu::VKContext::flush_render_graph                                      vk_context.cc                148  0x82faa9e      
7  blender::gpu::VKContext::deactivate                                              vk_context.cc                126  0x82faa9e      
8  GPU_context_active_set                                                           gpu_context.cc               144  0x82001d4      
9  GPU_viewport_bind                                                                gpu_viewport.cc              204  0x82cf133      
10 wm_draw_region_bind                                                              wm_draw.cc                   751  0x33e397e      
11 wm_draw_window_offscreen                                                         wm_draw.cc                   1005 0x33e397e      
12 wm_draw_window                                                                   wm_draw.cc                   1177 0x33e397e      
13 wm_draw_update                                                                   wm_draw.cc                   1581 0x33e397e      
14 WM_main                                                                          wm.cc                        644  0x33dee28      
15 main                                                                             creator.cc                   588  0x28bc0c5      

Also a hotspot capture:

image

**System Information** Operating system: Debian Testing Graphics card: Nvidia RTX A4500 (drivers 535.183). **Blender Version** Broken: main **Short description of error** Blender will deadlock when started with the Vulkan backend. UI shows up, but is totally unresponsive (only a few main loops happen). **Exact steps for others to reproduce the error** From debugger it seems that the code is locked on the infinite loop in `VKCommandBufferWrapper::wait_for_cpu_synchronization`: ``` 1 poll 0x7f33f600a11f 2 ?? 0x7f33927157f3 3 ?? 0x7f3392b243b9 4 ?? 0x7f3392a54b40 5 blender::gpu::render_graph::VKCommandBufferWrapper::wait_for_cpu_synchronization vk_command_buffer_wrapper.cc 103 0x8315d50 6 blender::gpu::VKContext::flush_render_graph vk_context.cc 148 0x82faa9e 7 blender::gpu::VKContext::deactivate vk_context.cc 126 0x82faa9e 8 GPU_context_active_set gpu_context.cc 144 0x82001d4 9 GPU_viewport_bind gpu_viewport.cc 204 0x82cf133 10 wm_draw_region_bind wm_draw.cc 751 0x33e397e 11 wm_draw_window_offscreen wm_draw.cc 1005 0x33e397e 12 wm_draw_window wm_draw.cc 1177 0x33e397e 13 wm_draw_update wm_draw.cc 1581 0x33e397e 14 WM_main wm.cc 644 0x33dee28 15 main creator.cc 588 0x28bc0c5 ``` Also a hotspot capture: ![image](/attachments/517cf83e-17ca-422d-9d04-23f6068d4660)
514 KiB
Member

A deadlock can happen when the device is lost (driver reset/restart). On NVIDIA this can be the case.
What would help would be to use vkconfig/API dump to see what commands have been executed before it hangs.
Updating to the latest available linux drivers could also solve some issues I use 555.

A deadlock can happen when the device is lost (driver reset/restart). On NVIDIA this can be the case. What would help would be to use vkconfig/API dump to see what commands have been executed before it hangs. Updating to the latest available linux drivers could also solve some issues I use 555.
Author
Owner

Here is the massive 11MB dump until UI shows up and freeze forever:

blender.txt

blender.html

Here is the massive 11MB dump until UI shows up and freeze forever: [blender.txt](/attachments/a7198132-b035-47b4-a8ce-83004a390c71) [blender.html](/attachments/ee6b02c6-d530-48f2-9477-aba96d1bb78f)
Member

Hmm.. Seems like the last commands send to the GPU is an empty list. I did some experiments in the past to remove them, but as it complicated the code I didn't continued on it. Perhaps that is failing on this device.

Thread 0, Frame 4:
vkAllocateCommandBuffers(device, pAllocateInfo, pCommandBuffers) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x7f5,4ed,1f1,150
    pAllocateInfo:                  const VkCommandBufferAllocateInfo* = 0x7f5,4cc,e43,980:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO (40)
        pNext:                          const void* = NULL
        commandPool:                    VkCommandPool = 0x7f5,4cd,30d,e50
        level:                          VkCommandBufferLevel = VK_COMMAND_BUFFER_LEVEL_PRIMARY (0)
        commandBufferCount:             uint32_t = 1
    pCommandBuffers:                VkCommandBuffer* = 0x7f5,4cc,e43,a28
        pCommandBuffers[0]:             VkCommandBuffer = 0x7f5,4c6,a36,850

Thread 0, Frame 4:
vkBeginCommandBuffer(commandBuffer, pBeginInfo) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x7f5,4c6,a36,850
    pBeginInfo:                     const VkCommandBufferBeginInfo* = 0x7f5,4cc,e43,9a0:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO (42)
        pNext:                          const void* = NULL
        flags:                          VkCommandBufferUsageFlags = 1 (VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT)
        pInheritanceInfo:               const VkCommandBufferInheritanceInfo* = UNUSED

Thread 0, Frame 4:
vkEndCommandBuffer(commandBuffer) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x7f5,4c6,a36,850

Thread 0, Frame 4:
vkResetFences(device, fenceCount, pFences) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x7f5,4ed,1f1,150
    fenceCount:                     uint32_t = 1
    pFences:                        const VkFence* = 0x7ff,e5e,909,ae8
        pFences[0]:                     const VkFence = 0x7f5,4ee,1f3,450

Thread 0, Frame 4:
vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 0x7f5,4ee,1e9,c50
    submitCount:                    uint32_t = 1
    pSubmits:                       const VkSubmitInfo* = 0x7f5,4cc,e43,9d8
        pSubmits[0]:                    const VkSubmitInfo = 0x7f5,4cc,e43,9d8:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO (4)
            pNext:                          const void* = NULL
            waitSemaphoreCount:             uint32_t = 0
            pWaitSemaphores:                const VkSemaphore* = NULL
            pWaitDstStageMask:              const VkPipelineStageFlags* = NULL
            commandBufferCount:             uint32_t = 1
            pCommandBuffers:                const VkCommandBuffer* = 0x7f5,4cc,e43,a28
                pCommandBuffers[0]:             const VkCommandBuffer = 0x7f5,4c6,a36,850
            signalSemaphoreCount:           uint32_t = 0
            pSignalSemaphores:              const VkSemaphore* = NULL
    fence:                          VkFence = 0x7f5,4ee,1f3,450
Hmm.. Seems like the last commands send to the GPU is an empty list. I did some experiments in the past to remove them, but as it complicated the code I didn't continued on it. Perhaps that is failing on this device. ``` Thread 0, Frame 4: vkAllocateCommandBuffers(device, pAllocateInfo, pCommandBuffers) returns VkResult VK_SUCCESS (0): device: VkDevice = 0x7f5,4ed,1f1,150 pAllocateInfo: const VkCommandBufferAllocateInfo* = 0x7f5,4cc,e43,980: sType: VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO (40) pNext: const void* = NULL commandPool: VkCommandPool = 0x7f5,4cd,30d,e50 level: VkCommandBufferLevel = VK_COMMAND_BUFFER_LEVEL_PRIMARY (0) commandBufferCount: uint32_t = 1 pCommandBuffers: VkCommandBuffer* = 0x7f5,4cc,e43,a28 pCommandBuffers[0]: VkCommandBuffer = 0x7f5,4c6,a36,850 Thread 0, Frame 4: vkBeginCommandBuffer(commandBuffer, pBeginInfo) returns VkResult VK_SUCCESS (0): commandBuffer: VkCommandBuffer = 0x7f5,4c6,a36,850 pBeginInfo: const VkCommandBufferBeginInfo* = 0x7f5,4cc,e43,9a0: sType: VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO (42) pNext: const void* = NULL flags: VkCommandBufferUsageFlags = 1 (VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT) pInheritanceInfo: const VkCommandBufferInheritanceInfo* = UNUSED Thread 0, Frame 4: vkEndCommandBuffer(commandBuffer) returns VkResult VK_SUCCESS (0): commandBuffer: VkCommandBuffer = 0x7f5,4c6,a36,850 Thread 0, Frame 4: vkResetFences(device, fenceCount, pFences) returns VkResult VK_SUCCESS (0): device: VkDevice = 0x7f5,4ed,1f1,150 fenceCount: uint32_t = 1 pFences: const VkFence* = 0x7ff,e5e,909,ae8 pFences[0]: const VkFence = 0x7f5,4ee,1f3,450 Thread 0, Frame 4: vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0): queue: VkQueue = 0x7f5,4ee,1e9,c50 submitCount: uint32_t = 1 pSubmits: const VkSubmitInfo* = 0x7f5,4cc,e43,9d8 pSubmits[0]: const VkSubmitInfo = 0x7f5,4cc,e43,9d8: sType: VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO (4) pNext: const void* = NULL waitSemaphoreCount: uint32_t = 0 pWaitSemaphores: const VkSemaphore* = NULL pWaitDstStageMask: const VkPipelineStageFlags* = NULL commandBufferCount: uint32_t = 1 pCommandBuffers: const VkCommandBuffer* = 0x7f5,4cc,e43,a28 pCommandBuffers[0]: const VkCommandBuffer = 0x7f5,4c6,a36,850 signalSemaphoreCount: uint32_t = 0 pSignalSemaphores: const VkSemaphore* = NULL fence: VkFence = 0x7f5,4ee,1f3,450 ```
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Code Documentation
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Viewport & EEVEE
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Module
Viewport & EEVEE
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Severity
High
Severity
Low
Severity
Normal
Severity
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#129160
No description provided.