GPU: Add PROFILE_DEBUG_GROUPS #116304
No reviewers
Labels
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: blender/blender#116304
Loading…
Reference in New Issue
No description provided.
Delete Branch "pragma37/blender:pull-gpu-profile"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Add an option to profile GPU, CPU and Latency timings of GPU debug groups.
This is what the printed info looks like:
These are way more accurate than the timings provided by RenderDoc.
And, while all GPU vendors provide their own profilers, I think a built-in option like this can be convenient.
(I've personally not been able to get the Nvidia Nsight profiler to work correctly, despite the debugger working fine)
At the moment, this is enabled at compile time by setting the
PROFILE_DEBUG_GROUPS
macro to 1,but I think a command line option would make more sense. (?)
(This only includes the OpenGL implementation).
@ -101,0 +107,4 @@
float cpu_time;
};
struct FrameQueries {
Vector<TimeQuery> queries;
Use
blender::Stack<TimeQuery>
. I believe this would simplify the implementation.Keep in mind it wouldn't be possible to just pop
TimeQueries
from the stack ondebug_group_end
.They need to stay there until the query is actually available.
@ -367,2 +368,4 @@
* \{ */
#define PROFILE_DEBUG_GROUPS 0
#define MAX_DEBUG_GROUPS_STACK_DEPTH 8
Why use a hardcoded max stack depth?
I did this initially for Workbench, so I added the max depth to remove the per-texture debug groups.
This doesn't work that well for EEVEE-Next, though, since per-material sub-passes have different depths depending on the pass.
I think it may make more sense to be able to set some kind of per-sub-pass debug granularity level, so those can be skipped?
Maybe we could tag sub-passes differently in the debug stack and these would be excluded from the stats tree. I think that would be more elegant solution.
I think a compile time option is fine. Using a startup flag would be useful for performance measurements on devices that we don't have access to and for some cases it can quickly point to issues and platform differences.
Using a command line argument eg
--debug-gpu-timings
would not be useful as you need to know what the user has been doing together with the specific frame timings and what are actually running in the background etc.@Jeroen-Bakker Aren't startup flags and command line arguments the same?
Yes, sorry for the confusion.
I agree with Jeroen. This should be available on release builds with either a debug option in the UI or a launch argument.
@ -100,1 +100,4 @@
struct TimeQuery {
std::string name;
GLuint handles[2];
Call it
start
andend
. Took me a bit to understood.@ -377,0 +391,4 @@
glGetInteger64v(GL_TIMESTAMP, &query.cpu_start);
/* Use GL_TIMESTAMP instead of GL_ELAPSED_TIME to support nested debug groups */
glGenQueries(2, query.handles);
I'm wondering if generating queries in bulk would be a better idea for performance.
But given its at most a hundred of these, I think it is fine.
@ -389,0 +482,4 @@
<< "\n";
}
std::string print = result.str();
Can't you output everything to
std::cout
instead? Creating astd::string
for that seems quite convoluted just to useprintf
.I've tried to use
stringstream.rdbuf()
but it turned out to be massively slow, so while now it's usingstd::cout
, I think we will have to keep the string conversion.eba4e6d5e7
to3fefa782e1
I've updated the PR with several changes:
The PROFILE_DEBUG_GROUPS define has been replaced with a new
--profile-gpu
startup flag.The MAX_STACK_DEPTH compile time option has been replaced by a
profile_gpu_level
wich can be optionally specified with--profile-gpu <level>
. Higher levels mean more detail. If no level is specified then the value is set to INT_MAX and every debug group is profiled.The level of a debug group can be set in
GPU_debug_group_begin
, but I've set it up in a way that it's handled almost automatically. There are 4 levels:GPU_debug_group_begin
andDRW_stats_group/query_start
.It's not perfect, but I'd say it's ok enough and better than having to handle them manually. I'm open to feedback, though.
I've put every change in a separate commit so it's easier to read.
Note that right now there's an issue with debug groups begin/end mismatches in
wm_draw_window_offscreen
so you'll see a lot ofProfile GPU error: Missing GPU_debug_group_end() call
messages. In reality the calls are not really missing, but they are done after swapping context inED_region_do_draw
which triggers a context activation/deactivation.Moving the
process_frame_timings
call toContext::end_frame
fixes the issue, but then draw manager context never runs theprocess_frame_timings
function, so I'm not sure yet how to handle it yet.I've moved
process_frame_timings
toend_frame
and added calls tobegin/end_frame
inDRW_gpu_context_enable/disable_ex
.It seems to be working fine, but I have no idea if there's any gotcha I should take into account.
Overall I am fine with this.
Vulkan implementation is similar, also uses query pools to track timings (vkCmdWriteTimestamp) The timestamp data type is platform specific, and might become a union at that point.
@ -387,0 +473,4 @@
break;
}
std::stringstream result;
When we want to include this to other backend this will require to copy the reporting style as well.
eventually we should provide the reporting structure in gpu/intern and fill it.
Yes it will introduce another level, but improve that other tools can be written around it.
There is still an idea to have a performance area in blender.
@ -689,6 +689,7 @@ static void print_help(bArgs *ba, bool all)
PRINT("\n");
PRINT("GPU Options:\n");
BLI_args_print_arg_doc(ba, "--gpu-backend");
BLI_args_print_arg_doc(ba, "--profile-gpu");
--debug-gpu-profile-level
would be more in line with the other options.Is this a debug feature, though? "debug-gpu" kind of implies a debug GPU context, which this doesn't use (and shouldn't!).
For level, I didn't include it in the name because the level is optional (
blender --profile-gpu-level
seems weird to me).Making it non-optional would make it more similar to other options, but IMO is more useful this way.
f609be982d
to92fcbc4d9f
92fcbc4d9f
to77dc78e481
I've moved the formatting and printing to a separate class.
It would be nice to move more things to the
gpu::Context
itself (like the level checking and the begin/end mismatches), but I'm not sure how to do it in a clean way, especially not knowing the requirement for other backends.I did some more research on profiling and come to a different conclusion I had compared to last month. My take on this is that making a GPU profiler with meaningful values is very hard.
Without good understanding of how a driver schedules the different tasks to the GPU and which tasks are already running could lead to less information. Depending on what is expected I would like to see in the task description what is actually being measured and ensured to be correct.
For Vulkan/metal backend where we have more control about the scheduling it makes more sense to use a different timing method.
Metal has the option to use serialize exectution so you know for sure what is being measured. For vulkan we might need to introduce a different scheduling method to ensure that what is measured makes sense. Which one will depend on the actual needs.
My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview. If I want to dive deeper I select one shader and go to Metal to get overview of what is happening in the shader on a per line level. Or use RDP for indepth analysis.
Due to misunderstanding of what will actually be measured I might not use this patch. One big benefit of this patch is to track better what is going on on a specific system where we don't have access to. I am just not sure if the costs of this patch outweighs the benefit. Especially as GPU tasks are normally executed out of order. In order execution has already performance loss.
Not in my experience. I often see timings that don't match the actual Blender runtime performance at all.
And it's not surprising given that Renderdoc runs in a very different context (a single frame, on debug mode, without the application CPU overhead...).
The precision of performance queries may be far from perfect but, at least in my experience, they roughly match the real thing.
This patch already has been good enough to detect and fix several performance issues.
This also has the advantage of having a really low iteration time for testing changes, you can just add the flag to your IDE, make a change and hit "run" to see the results.
Sorry for taking so much time to review.
Given the last point might require quite more work, I would accept this patch without it.
@ -0,0 +18,4 @@
report << std::to_string(gpu_total_time).substr(0, 4) << " | ";
report << std::to_string(cpu_total_time).substr(0, 4) << " | \n";
}
void add_group(
Missing new line between function.
#116304 (comment)
😑😑😑
I'll look into the flamegraph thing. Sounds like a good idea.
Checkout
From your project repository, check out a new branch and test the changes.