WIP: Vulkan: Add functions for Debug to GPU Vulkan Module #105484

Closed
Kazashi Yoshioka wants to merge 28 commits from (deleted):vulkan-debuggingtools into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Contributor

Add functions for Debug to GPU Vulkan Module.

I would like to add Debug function to GPU/Vulkan.
The newly added items are roughly divided into three.

  1. Check errors.
  2. To manage the internal message of Vulkan.
  3. Vulkan Object labeling, command and queue framing.

Check errors

Overload the Vulkan API functions that returns VkResult and switch the check with the runtime.

To manage the internal message of Vulkan.

Operate debug messages using VkInstance's extension(vk_ext_debug_utils).
This requires the following steps:
  • Register Extension when generating Vkinstance.
  • Load the necessary functions from Vkinstance.
  • Register a callback function.

Vulkan Object labeling, command and queue framing.

Name Vulkan Object.
This effect can be considered:
  • we can know the name of Vulkan Object in the message callback function.
  • All labels are recognized when using external Debugger such as RenderDoc.
  • By framing and naming command buffers and queues, the structure of the runtime is clear.

Alternative proposals

  • As for the load of functions, it may be possible to use a third party such as VOLK.
  • With regard to overload functions, it is possible to define NO_PROTOTYPE and implement all Vulkan functions on our own when include vulkan.h.

Splitting this pull request

  • Policy on linking with vulkanSDK.[#105922 (comment)]

  • Implementation of DebugUtils.

  • Handling runtime errors.

  • Discussion and improvements to ContextVK initialization.

TODO

  • debugprintfEXT.
  • Debug messages are classified, especially to operate the performance flag message.
**Add functions for Debug to GPU Vulkan Module.** I would like to add Debug function to GPU/Vulkan. The newly added items are roughly divided into three. 1. Check errors. 2. To manage the internal message of Vulkan. 3. Vulkan Object labeling, command and queue framing. ___ #### Check errors > ##### Overload the Vulkan API functions that returns VkResult and switch the check with the runtime. --- #### To manage the internal message of Vulkan. > ##### Operate debug messages using VkInstance's extension(vk_ext_debug_utils). > ##### This requires the following steps: >> + ##### Register Extension when generating Vkinstance. >> + ##### Load the necessary functions from Vkinstance. >> + ##### Register a callback function. --- #### Vulkan Object labeling, command and queue framing. > ##### Name Vulkan Object. > ##### This effect can be considered: >> + ##### we can know the name of Vulkan Object in the message callback function. >> + ##### All labels are recognized when using external Debugger such as RenderDoc. >> + ##### By framing and naming command buffers and queues, the structure of the runtime is clear. #### Alternative proposals >> + ##### As for the load of functions, it may be possible to use a third party such as VOLK. >> + ##### With regard to overload functions, it is possible to define NO_PROTOTYPE and implement all Vulkan functions on our own when include vulkan.h. #### Splitting this pull request - [x] Policy on linking with vulkanSDK.[https://projects.blender.org/blender/blender/pulls/105922#issue-95143] - [x] Implementation of DebugUtils. - [ ] Handling runtime errors. - [x] Discussion and improvements to ContextVK initialization. #### TODO + ##### debugprintfEXT. + ##### Debug messages are classified, especially to operate the performance flag message.
Kazashi Yoshioka added 10 commits 2023-03-06 10:57:49 +01:00
Kazashi Yoshioka requested review from Jeroen Bakker 2023-03-06 10:58:21 +01:00
Kazashi Yoshioka changed title from Vulkan : Add functions for Debug to GPU Vulkan Module. to WIP: Vulkan : Add functions for Debug to GPU Vulkan Module. 2023-03-06 11:26:03 +01:00
Kazashi Yoshioka added 1 commit 2023-03-06 13:44:43 +01:00
Hans Goudey changed title from WIP: Vulkan : Add functions for Debug to GPU Vulkan Module. to WIP: Vulkan: Add functions for Debug to GPU Vulkan Module 2023-03-07 17:02:17 +01:00
Kazashi Yoshioka added 1 commit 2023-03-07 22:19:30 +01:00
Jeroen Bakker reviewed 2023-03-08 13:59:48 +01:00
@ -992,2 +992,4 @@
set(WITH_VULKAN_BACKEND OFF)
endif()
if(EXISTS ${LIBDIR}/shaderc)
Member

This part should be done in an additional PR. And should follow the same structure as platform_unix/apple.

if(WITH_VULKAN_BACKEND)
  find_package_wrapper(Vulkan REQUIRED)
  find_package_wrapper(ShaderC REQUIRED)
endif()
This part should be done in an additional PR. And should follow the same structure as platform_unix/apple. ``` if(WITH_VULKAN_BACKEND) find_package_wrapper(Vulkan REQUIRED) find_package_wrapper(ShaderC REQUIRED) endif() ```
Author
Contributor

Fixed by LazyDodo.

commit(b7679addd2c6)

Fixed by LazyDodo. > [commit(b7679addd2c6)](https://projects.blender.org/blender/blender/commit/b7679addd2c6808527d240ccd9be994bfe3bf99e)
vnapdv marked this conversation as resolved
Jeroen Bakker requested changes 2023-03-08 14:11:51 +01:00
Jeroen Bakker left a comment
Member

Did a global overview. think we should first move bits and pieces into the right location before going in more details.

Did a global overview. think we should first move bits and pieces into the right location before going in more details.
@ -0,0 +29,4 @@
/* Function pointer definitions for use only in this file.*/
#if defined(VK_EXT_debug_utils)
PFN_vkCreateDebugUtilsMessengerEXT pfvkCreateDebugUtilsMessengerEXT = nullptr;
Member

We should move these pointers to the VKContext. We might want to add a class there to put all the debug stuff in.

We should move these pointers to the VKContext. We might want to add a class there to put all the debug stuff in.
Author
Contributor

These functions are only used in this cc file. I should have VKContext load all of the extension functions in one place, but I can't think of any benefit from creating multiple instances.

These functions are only used in this cc file. I should have VKContext load all of the extension functions in one place, but I can't think of any benefit from creating multiple instances.
vnapdv marked this conversation as resolved
@ -0,0 +470,4 @@
/*
* TODO:: Comment
*/
void GHOST_VulkanInstanceLoad(void *m_instance)
Member

GPU module is allowed to call GHOST, but we don't allow the reverse.

The m_instance is part of the VKContext and might just need to call it here.
There we also use vk function pointers to setup the vma. We might want to incorporate it there.

GPU module is allowed to call GHOST, but we don't allow the reverse. The m_instance is part of the VKContext and might just need to call it here. There we also use vk function pointers to setup the vma. We might want to incorporate it there.
Author
Contributor

Here's what we want to do with this change:

Even though many vulkan objects are generated in the initializeDrawingContext function, debugutils can't keep track of them.
So, we should register debugutils after the VkInstance is generated but before the VkDevice is generated.

The benefits of this.
  • During logical device generation, you can get information about the dynamic library loading of extensions.
  • we can track the generation of vulkan objects related to swapchains.
#### Here's what we want to do with this change: Even though many vulkan objects are generated in the initializeDrawingContext function, debugutils can't keep track of them. So, we should register debugutils after the VkInstance is generated but before the VkDevice is generated. ##### The benefits of this. - During logical device generation, you can get information about the dynamic library loading of extensions. - we can track the generation of vulkan objects related to swapchains.
@ -0,0 +33,4 @@
template<typename T> void object_vk_label(VkDevice device, T obj, const std::string &name);
void object_vk_label(VkDevice device, VkObjectType objType, uint64_t obj, const std::string &name);
void pushMarker(VkCommandBuffer cmd, const std::string &name);
Member

We might want to add this to VKContext.debug_group_begin/end and hook it with the GPU_debug.h

We might want to add this to `VKContext.debug_group_begin/end` and hook it with the `GPU_debug.h`
Author
Contributor

It will be done by labeling for VkQueue.

It will be done by labeling for VkQueue.
vnapdv marked this conversation as resolved
@ -0,0 +45,4 @@
}
#define VK_ERROR_CHECK_ENABLE
Member

Why not use the validation layers. For OpenGL this was desired, but not sure we need it for Vulkan/Metal as it is already part of its SDK/Infrastructure.

Why not use the validation layers. For OpenGL this was desired, but not sure we need it for Vulkan/Metal as it is already part of its SDK/Infrastructure.
Author
Contributor
What I want to do in this part.
Error check using wrap function instead of VK_CHECK by #define.

This advantages.

> we don't have to write a define like VK_CHECK.

> we can also switch at runtime.

> it becomes easy to uncheck uniformly.


I thought about removing this check completely in release builds.
However, it's always a good idea to do runtime error checking.
--- **What I want to do in this part.** ``` Error check using wrap function instead of VK_CHECK by #define. ``` This advantages. > we don't have to write a define like VK_CHECK. > we can also switch at runtime. > it becomes easy to uncheck uniformly. I thought about removing this check completely in release builds. However, it's always a good idea to do runtime error checking.
Author
Contributor

I think I'm a little misunderstood.
What exactly does use the validation layers mean?

I think I'm a little misunderstood. What exactly does use the validation layers mean?
Author
Contributor

Again, as you say, we have to keep the functions as members of the structure.
I'm still verifying that it works.

Again, as you say, we have to keep the functions as members of the structure. I'm still verifying that it works.
Kazashi Yoshioka added 1 commit 2023-03-09 11:57:44 +01:00
0b2292b3de FIX:Register a new function in GHOST_C-api.
Divide the initialization part of ContextVK into an instantiation part and a logical device generation part, and register debugutils in between. This allows us to track the generation of all vulkan objects.
Kazashi Yoshioka added 1 commit 2023-03-09 12:32:08 +01:00
Kazashi Yoshioka added 1 commit 2023-03-09 12:46:00 +01:00
Kazashi Yoshioka added 1 commit 2023-03-09 14:56:25 +01:00
Kazashi Yoshioka added 2 commits 2023-03-09 17:14:03 +01:00
Kazashi Yoshioka added 1 commit 2023-03-09 18:33:55 +01:00
Kazashi Yoshioka added 1 commit 2023-03-10 03:19:42 +01:00
Kazashi Yoshioka added 1 commit 2023-03-10 03:35:08 +01:00
Kazashi Yoshioka requested review from Jeroen Bakker 2023-03-10 03:45:48 +01:00
Kazashi Yoshioka added 2 commits 2023-03-10 16:32:36 +01:00
Kazashi Yoshioka added 3 commits 2023-03-10 16:44:20 +01:00
Kazashi Yoshioka added 1 commit 2023-03-10 16:53:50 +01:00
Author
Contributor

I created a mode that defines all the vulkan functions our own, using the definition VK_NO_PROTOTYPES. Then check for errors.

As a premise, define VK_NO_PROTOTYPES before including vulkan.h.
Keep only what blender uses.
Here, LibraryFunc (LF) is a function that gets a library such as LoadlibraryA.
0.  Get LF1 per OS.
1.  Load vulkan dll(DLL1) from LF1.
2.  Retrieves the built-in functions and LF2 from DLL1(`vulkan_dynamic_load`)

3.  Load instance dll(DLL2) from LF2.
4.  Retrieves the instance extension functions which  must be enabled and LF3 from DLL2.(`vulkan_dynamic_load_instance`).

5.  Load device dll(DLL3) dll from LF3.
6.  Retrieves the device extension functions which  must be enabled from DLL3.(`vulkan_dynamic_load_device`).
### I created a mode that defines all the vulkan functions our own, using the definition VK_NO_PROTOTYPES. Then check for errors. ##### As a premise, define `VK_NO_PROTOTYPES` before including `vulkan.h`. ##### Keep only what blender uses. ##### Here, LibraryFunc (LF) is a function that gets a library such as LoadlibraryA. 0. Get LF1 per OS. 1. Load vulkan dll(DLL1) from LF1. 2. Retrieves the built-in functions and LF2 from DLL1(`vulkan_dynamic_load`) 3. Load instance dll(DLL2) from LF2. 4. Retrieves the instance extension functions which must be enabled and LF3 from DLL2.(`vulkan_dynamic_load_instance`). 5. Load device dll(DLL3) dll from LF3. 6. Retrieves the device extension functions which must be enabled from DLL3.(`vulkan_dynamic_load_device`).
Kazashi Yoshioka added 1 commit 2023-03-11 05:06:23 +01:00
Author
Contributor

above part


how to load

Inside the GPU module, all vulkan functions are compiled as nullptrs and loaded outside the main program.
An externally defined vulkan function wraps them. This technique is thanks to extern inline.

about speed

In the test, VkInstance was created and destroyed 1000 times. here!

With or without definition = VK_NO_PROTOTYPES(If you remove the definition, the built-in function will remain as it is.),there is almost no difference. In a few alternating runs, the fastest result was the one that dynamically loaded, but on average it seems to have a bit more overhead.


here is the definition
VkInstance 1000 creation intern callee 4.646438 extern callee 4.589588

as it is
VkInstance 1000 creation intern callee 4.708100 extern callee 5.095500

[above part](https://projects.blender.org/vnapdv/blender/src/commit/9703088d038648346c00912b306d182e4a8db31b/source/blender/gpu/vulkan/vk_context.hh#L177) ----- ## how to load Inside the GPU module, all vulkan functions are compiled as nullptrs and loaded outside the main program. An externally defined vulkan function wraps them. This technique is thanks to [extern inline.](https://projects.blender.org/vnapdv/blender/src/commit/9703088d038648346c00912b306d182e4a8db31b/source/blender/gpu/vulkan/vk_context.hh#L229) ## about speed In the test, VkInstance was created and destroyed 1000 times. [here!](https://projects.blender.org/vnapdv/blender/src/commit/9703088d038648346c00912b306d182e4a8db31b/source/blender/gpu/tests/gpu_shader_test.cc#L38) With or without definition [= VK_NO_PROTOTYPES](https://projects.blender.org/vnapdv/blender/src/commit/9703088d038648346c00912b306d182e4a8db31b/source/blender/gpu/vulkan/vk_common.hh#L11)(If you remove the definition, the built-in function will remain as it is.),there is almost no difference. In a few alternating runs, the fastest result was the one that dynamically loaded, but on average it seems to have a bit more overhead. --- ###### here is the definition ##### `VkInstance 1000 creation intern callee 4.646438 extern callee 4.589588` --- ###### as it is ##### `VkInstance 1000 creation intern callee 4.708100 extern callee 5.095500` ---
Author
Contributor

The print of debugUtils looks like this!

The print of debugUtils looks like this!
Jeroen Bakker requested changes 2023-03-14 09:37:47 +01:00
Jeroen Bakker left a comment
Member

This PR feels like it contains multiple changes in a single PR. It is better to split this into multiple PR's explain for each PR the reasoning. I am not a Vulkan expert and perhaps the changes are needed, but we should discuss them one by one, explain and document. Decisions should be clearly made and documented.

For example:

  • Why should we do dynamic loading/linking? If we do dynamic loading, why should it be part of the GPU module.
  • Should we use vulkan validation layers or wrap each function ourselves? Until now the validation layers are sufficient.
  • Debug names + Split of object creation in ghost.
This PR feels like it contains multiple changes in a single PR. It is better to split this into multiple PR's explain for each PR the reasoning. I am not a Vulkan expert and perhaps the changes are needed, but we should discuss them one by one, explain and document. Decisions should be clearly made and documented. For example: - Why should we do dynamic loading/linking? If we do dynamic loading, why should it be part of the GPU module. - Should we use vulkan validation layers or wrap each function ourselves? Until now the validation layers are sufficient. - Debug names + Split of object creation in ghost.
@ -103,0 +174,4 @@
extern VKWrapper vk_wrapper;
# endif
#endif
class VKFunctionsLoader {
Member

Unclear to me why we need to load all functions lazy. What are we trying to solve and if we want to solve something, we should describe that and perhaps create a different pull request for that.

Unclear to me why we need to load all functions lazy. What are we trying to solve and if we want to solve something, we should describe that and perhaps create a different pull request for that.
@ -0,0 +45,4 @@
}
}
/* clang-format off */
Member

When having VK_LAYER_KHRONOS_validation available on your system and starting blender with --debug-gpu you should already get the same results.

It might be that the validation layer misses some calls, but what I have found so far this makes it really easy to enable/disable validation at runtime without recompiling blender.

When having `VK_LAYER_KHRONOS_validation` available on your system and starting blender with `--debug-gpu` you should already get the same results. It might be that the validation layer misses some calls, but what I have found so far this makes it really easy to enable/disable validation at runtime without recompiling blender.
Author
Contributor

When I was testing the linux build, it seems that the warning due to undefined function of extern inline cannot be suppressed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66918
Should have investigated first.
Thank you for your kind comments.
Since Vulkan functions are global, they are easier to understand and run faster than static definitions, so that's what I did.
However, it seems that Linux does not allow immaterial use without warning.
Write programs that do one thing and do it well.

When I was testing the linux build, it seems that the warning due to undefined function of extern inline cannot be suppressed. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66918 Should have investigated first. Thank you for your kind comments. Since Vulkan functions are global, they are easier to understand and run faster than static definitions, so that's what I did. However, it seems that Linux does not allow immaterial use without warning. `Write programs that do one thing and do it well.`
Author
Contributor

However, I think it's debatable when it comes to initializing ContextVK.
As you suggested, I'm going to break it down into several PRs.

However, I think it's debatable when it comes to initializing ContextVK. As you suggested, I'm going to break it down into several PRs.
Kazashi Yoshioka reviewed 2023-03-16 02:30:18 +01:00
@ -0,0 +45,4 @@
}
}
/* clang-format off */
Author
Contributor

There is no direct relationship between VK_LAYER_KHRONOS_validation being enabled and error(VkResult) handling for vulkan functions.
I wrote it because I thought it would be easier to operate if the function can be called as is, without the execution speed overhead of wrapping the function.And within gl_debug.hh, to align with such an implementation. 9703088d03/source/blender/gpu/opengl/gl_debug.hh (L106)

Would you like to use something like Vk_CHECK to handle VkResult in the same way as ContextVk?

There is no direct relationship between VK_LAYER_KHRONOS_validation being enabled and error(VkResult) handling for vulkan functions. I wrote it because I thought it would be easier to operate if the function can be called as is, without the execution speed overhead of wrapping the function.And within gl_debug.hh, to align with such an implementation. https://projects.blender.org/vnapdv/blender/src/commit/9703088d038648346c00912b306d182e4a8db31b/source/blender/gpu/opengl/gl_debug.hh#L106 Would you like to use something like Vk_CHECK to handle VkResult in the same way as ContextVk?
Jeroen Bakker reviewed 2023-03-19 08:12:02 +01:00
@ -0,0 +45,4 @@
}
}
/* clang-format off */
Member

Although this can be done in a similar way as OpenGL. I expect that developers working on the vulkan backend most likely have the validation layer enabled.

For end users I can see the additional value as it can help with finding out what the specific platform works. In that case I would use a similar approach as OpenGL. Note that the downside is that during development you might receive the message twice, but that is expected.

We should extract this in a separate PR another implementation would be to provide a validation layer that will report this. This way the overhead/code will be isolated.

Although this can be done in a similar way as OpenGL. I expect that developers working on the vulkan backend most likely have the validation layer enabled. For end users I can see the additional value as it can help with finding out what the specific platform works. In that case I would use a similar approach as OpenGL. Note that the downside is that during development you might receive the message twice, but that is expected. We should extract this in a separate PR another implementation would be to provide a validation layer that will report this. This way the overhead/code will be isolated.
Kazashi Yoshioka reviewed 2023-03-20 10:30:28 +01:00
@ -0,0 +45,4 @@
}
}
/* clang-format off */
Author
Contributor

In order for developers to use the validation layer, we will use vulkanSDK, but first I would like to make a PR about how to load it. In particular we have to decide whether to do it Explicit or Implicit. Do you have an opinion on this point?

windows

linux

In order for developers to use the validation layer, we will use vulkanSDK, but first I would like to make a PR about how to load it. In particular we have to decide whether to do it Explicit or Implicit. Do you have an opinion on this point? [windows](https://vulkan.lunarg.com/doc/sdk/1.3.239.0/windows/layer_configuration.html) [linux](https://vulkan.lunarg.com/doc/sdk/1.3.239.0/linux/layer_configuration.html)
vnapdv marked this conversation as resolved
Kazashi Yoshioka reviewed 2023-03-20 10:56:57 +01:00
@ -0,0 +45,4 @@
}
}
/* clang-format off */
Author
Contributor

Do this explicitly

We mentioned setting the environment variable VK_LAYER_PATH.

Do this implicitily

Pre-installing the SDK in a path that normally requires permissions.


Is there a problem setting the environment variables? Obviously you don't want to change variables like LD_LIBRARY_PATH, but not setting VK_LAYER_PATH is an implicit way of asking for permissions directly.

> Do this explicitly We mentioned setting the environment variable VK_LAYER_PATH. > Do this implicitily Pre-installing the SDK in a path that normally requires permissions. ----- Is there a problem setting the environment variables? Obviously you don't want to change variables like LD_LIBRARY_PATH, but not setting VK_LAYER_PATH is an implicit way of asking for permissions directly.
vnapdv marked this conversation as resolved
Kazashi Yoshioka closed this pull request 2023-11-02 21:01:28 +01:00

Pull request closed

Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#105484
No description provided.