GPU: Add PROFILE_DEBUG_GROUPS #116304

Miguel Pozo · 2023-12-18T17:13:20+01:00

Miguel Pozo commented

2023-12-18 17:13:20 +01:00

Add an option to profile GPU, CPU and Latency timings of GPU debug groups.
This is what the printed info looks like:

 Group                          | GPU  | CPU  | Latency
--------------------------------|------|------|--------
 Total                          | 9.15 | 3.62 |
. Manager.end_sync              | 0.16 | 0.12 | 1.29
. Workbench                     | 5.16 | 1.26 | 1.35
.. View.compute_visibility      | 0.04 | 0.04 | 3.91
.. DrawMultiBuf.bind            | 0.04 | 0.02 | 3.90
.. Opaque.Gbuffer               | 0.33 | 0.24 | 3.90
... MeshMaterial                | 0.31 | 0.04 | 3.87
... MeshTexture                 | 0.00 | 0.01 | 4.13
... CurvesMaterial              | 0.00 | 0.01 | 4.10
... CurvesTexture               | 0.00 | 0.01 | 4.08
... PointCloudMaterial          | 0.00 | 0.01 | 4.05
... PointCloudTexture           | 0.00 | 0.01 | 4.02
.. Opaque.Deferred              | 0.25 | 0.03 | 3.96
.. Workbench.Outline            | 0.23 | 0.02 | 4.17
.. TAA.Accumulation             | 0.47 | 0.02 | 4.36
.. SMAA.Resolve                 | 0.31 | 0.02 | 4.98
. Overlay                       | 0.87 | 0.97 | 5.24
.. psl->background_ps           | 0.20 | 0.03 | 5.22
.. psl->fade_ps[i]              | 0.00 | 0.01 | 5.38
.. psl->facing_ps[i]            | 0.00 | 0.01 | 5.35
.. psl->extra_blend_ps          | 0.00 | 0.02 | 5.32
.. psl->wireframe_ps            | 0.00 | 0.02 | 5.30
.. *p_armature_trans_ps         | 0.00 | 0.02 | 5.26
.. *p_armature_ps               | 0.00 | 0.04 | 5.23
.. psl->particle_ps             | 0.00 | 0.01 | 5.18
.. psl->metaball_ps[i]          | 0.00 | 0.01 | 5.14
.. *p_extra_ps                  | 0.00 | 0.03 | 5.12
.. psl->fade_ps[i]              | 0.00 | 0.01 | 5.08
.. psl->facing_ps[i]            | 0.00 | 0.01 | 5.05
.. psl->wireframe_xray_ps       | 0.00 | 0.01 | 5.02
.. *p_armature_trans_ps         | 0.00 | 0.01 | 4.99
.. *p_armature_ps               | 0.00 | 0.03 | 4.96
.. *p_extra_ps                  | 0.00 | 0.03 | 4.91
.. psl->metaball_ps[i]          | 0.00 | 0.03 | 4.85
.. psl->motion_paths_ps         | 0.00 | 0.02 | 4.79
.. psl->extra_grid_ps           | 0.00 | 0.01 | 4.75
.. psl->extra_centers_ps        | 0.00 | 0.03 | 4.71
.. psl->antialiasing_ps         | 0.54 | 0.03 | 4.66
. DebugDraw                     | 2.56 | 0.90 | 5.29
.. Lines                        | 2.52 | 0.68 | 5.26
... GPU                         | 0.02 | 0.04 | 7.17
... CPU                         | 0.02 | 0.02 | 7.12
.. Prints                       | 0.03 | 0.13 | 7.08
... GPU                         | 0.02 | 0.02 | 7.05
... CPU                         | 0.00 | 0.02 | 7.01
. RegionInfo                    | 0.00 | 0.03 | 6.85

These are way more accurate than the timings provided by RenderDoc.
And, while all GPU vendors provide their own profilers, I think a built-in option like this can be convenient.
(I've personally not been able to get the Nvidia Nsight profiler to work correctly, despite the debugger working fine)

At the moment, this is enabled at compile time by setting the PROFILE_DEBUG_GROUPS macro to 1,
but I think a command line option would make more sense. (?)

(This only includes the OpenGL implementation).

Add an option to profile GPU, CPU and Latency timings of GPU debug groups. This is what the printed info looks like: ``` Group | GPU | CPU | Latency --------------------------------|------|------|-------- Total | 9.15 | 3.62 | . Manager.end_sync | 0.16 | 0.12 | 1.29 . Workbench | 5.16 | 1.26 | 1.35 .. View.compute_visibility | 0.04 | 0.04 | 3.91 .. DrawMultiBuf.bind | 0.04 | 0.02 | 3.90 .. Opaque.Gbuffer | 0.33 | 0.24 | 3.90 ... MeshMaterial | 0.31 | 0.04 | 3.87 ... MeshTexture | 0.00 | 0.01 | 4.13 ... CurvesMaterial | 0.00 | 0.01 | 4.10 ... CurvesTexture | 0.00 | 0.01 | 4.08 ... PointCloudMaterial | 0.00 | 0.01 | 4.05 ... PointCloudTexture | 0.00 | 0.01 | 4.02 .. Opaque.Deferred | 0.25 | 0.03 | 3.96 .. Workbench.Outline | 0.23 | 0.02 | 4.17 .. TAA.Accumulation | 0.47 | 0.02 | 4.36 .. SMAA.Resolve | 0.31 | 0.02 | 4.98 . Overlay | 0.87 | 0.97 | 5.24 .. psl->background_ps | 0.20 | 0.03 | 5.22 .. psl->fade_ps[i] | 0.00 | 0.01 | 5.38 .. psl->facing_ps[i] | 0.00 | 0.01 | 5.35 .. psl->extra_blend_ps | 0.00 | 0.02 | 5.32 .. psl->wireframe_ps | 0.00 | 0.02 | 5.30 .. *p_armature_trans_ps | 0.00 | 0.02 | 5.26 .. *p_armature_ps | 0.00 | 0.04 | 5.23 .. psl->particle_ps | 0.00 | 0.01 | 5.18 .. psl->metaball_ps[i] | 0.00 | 0.01 | 5.14 .. *p_extra_ps | 0.00 | 0.03 | 5.12 .. psl->fade_ps[i] | 0.00 | 0.01 | 5.08 .. psl->facing_ps[i] | 0.00 | 0.01 | 5.05 .. psl->wireframe_xray_ps | 0.00 | 0.01 | 5.02 .. *p_armature_trans_ps | 0.00 | 0.01 | 4.99 .. *p_armature_ps | 0.00 | 0.03 | 4.96 .. *p_extra_ps | 0.00 | 0.03 | 4.91 .. psl->metaball_ps[i] | 0.00 | 0.03 | 4.85 .. psl->motion_paths_ps | 0.00 | 0.02 | 4.79 .. psl->extra_grid_ps | 0.00 | 0.01 | 4.75 .. psl->extra_centers_ps | 0.00 | 0.03 | 4.71 .. psl->antialiasing_ps | 0.54 | 0.03 | 4.66 . DebugDraw | 2.56 | 0.90 | 5.29 .. Lines | 2.52 | 0.68 | 5.26 ... GPU | 0.02 | 0.04 | 7.17 ... CPU | 0.02 | 0.02 | 7.12 .. Prints | 0.03 | 0.13 | 7.08 ... GPU | 0.02 | 0.02 | 7.05 ... CPU | 0.00 | 0.02 | 7.01 . RegionInfo | 0.00 | 0.03 | 6.85 ``` These are way more accurate than the timings provided by RenderDoc. And, while all GPU vendors provide their own profilers, I think a built-in option like this can be convenient. (I've personally not been able to get the Nvidia Nsight profiler to work correctly, despite the debugger working fine) At the moment, this is enabled at compile time by setting the `PROFILE_DEBUG_GROUPS` macro to 1, but I think a command line option would make more sense. (?) (This only includes the OpenGL implementation).

Miguel Pozo added the

Module

EEVEE & Viewport

label 2023-12-18 17:13:20 +01:00

Miguel Pozo requested review from Jeroen Bakker 2023-12-18 17:14:00 +01:00

Miguel Pozo requested review from Clément Foucault 2023-12-18 17:14:00 +01:00

Clément Foucault requested changes 2023-12-18 22:11:30 +01:00

source/blender/gpu/opengl/gl_context.hh Outdated

						
				@ -101,0 +107,4 @@

				    float cpu_time;

				  };

				  struct FrameQueries {

				    Vector<TimeQuery> queries;

Clément Foucault commented

2023-12-18 22:11:27 +01:00

Use blender::Stack<TimeQuery>. I believe this would simplify the implementation.

Use `blender::Stack<TimeQuery>`. I believe this would simplify the implementation.

Miguel Pozo commented

2023-12-19 16:23:06 +01:00

Keep in mind it wouldn't be possible to just pop TimeQueries from the stack on debug_group_end.
They need to stay there until the query is actually available.

Keep in mind it wouldn't be possible to just pop `TimeQueries` from the stack on `debug_group_end`. They need to stay there until the query is actually available.

fclem marked this conversation as resolved

source/blender/gpu/opengl/gl_debug.cc Outdated

						
				@ -367,2 +368,4 @@

				 * \{ */

				#define PROFILE_DEBUG_GROUPS 0

				#define MAX_DEBUG_GROUPS_STACK_DEPTH 8

Clément Foucault commented

2023-12-18 22:08:08 +01:00

Why use a hardcoded max stack depth?

Miguel Pozo commented

2023-12-19 16:28:14 +01:00

I did this initially for Workbench, so I added the max depth to remove the per-texture debug groups.
This doesn't work that well for EEVEE-Next, though, since per-material sub-passes have different depths depending on the pass.
I think it may make more sense to be able to set some kind of per-sub-pass debug granularity level, so those can be skipped?

I did this initially for Workbench, so I added the max depth to remove the per-texture debug groups. This doesn't work that well for EEVEE-Next, though, since per-material sub-passes have different depths depending on the pass. I think it may make more sense to be able to set some kind of per-sub-pass debug granularity level, so those can be skipped?

Clément Foucault commented

2023-12-24 12:24:45 +01:00

Maybe we could tag sub-passes differently in the debug stack and these would be excluded from the stats tree. I think that would be more elegant solution.

mano-wii marked this conversation as resolved

Jeroen Bakker reviewed 2023-12-19 12:42:37 +01:00

Jeroen Bakker left a comment

I think a compile time option is fine. Using a startup flag would be useful for performance measurements on devices that we don't have access to and for some cases it can quickly point to issues and platform differences.

Using a command line argument eg --debug-gpu-timings would not be useful as you need to know what the user has been doing together with the specific frame timings and what are actually running in the background etc.

I think a compile time option is fine. Using a startup flag would be useful for performance measurements on devices that we don't have access to and for some cases it can quickly point to issues and platform differences. Using a command line argument eg `--debug-gpu-timings` would not be useful as you need to know what the user has been doing together with the specific frame timings and what are actually running in the background etc.

Miguel Pozo commented

2023-12-19 16:31:00 +01:00

@Jeroen-Bakker Aren't startup flags and command line arguments the same?

Jeroen Bakker commented

2023-12-21 10:25:55 +01:00

Yes, sorry for the confusion.

Clément Foucault requested changes 2023-12-24 22:52:46 +01:00

Clément Foucault left a comment

I agree with Jeroen. This should be available on release builds with either a debug option in the UI or a launch argument.

source/blender/gpu/opengl/gl_context.hh Outdated

						
				@ -100,1 +100,4 @@

				  struct TimeQuery {

				    std::string name;

				    GLuint handles[2];

Clément Foucault commented

2023-12-24 12:34:40 +01:00

Call it start and end. Took me a bit to understood.

Call it `start` and `end`. Took me a bit to understood.

pragma37 marked this conversation as resolved

source/blender/gpu/opengl/gl_debug.cc Outdated

						
				@ -377,0 +391,4 @@

				    glGetInteger64v(GL_TIMESTAMP, &query.cpu_start);

				    /* Use GL_TIMESTAMP instead of GL_ELAPSED_TIME to support nested debug groups */

				    glGenQueries(2, query.handles);

Clément Foucault commented

2023-12-24 12:33:54 +01:00

I'm wondering if generating queries in bulk would be a better idea for performance.
But given its at most a hundred of these, I think it is fine.

I'm wondering if generating queries in bulk would be a better idea for performance. But given its at most a hundred of these, I think it is fine.

source/blender/gpu/opengl/gl_debug.cc Outdated

						
				@ -389,0 +482,4 @@

				               << "\n";

				      }

				      std::string print = result.str();

Clément Foucault commented

2023-12-24 22:18:45 +01:00

Can't you output everything to std::cout instead? Creating a std::string for that seems quite convoluted just to use printf.

Can't you output everything to `std::cout` instead? Creating a `std::string` for that seems quite convoluted just to use `printf`.

Miguel Pozo commented

2024-01-16 19:04:50 +01:00

I've tried to use stringstream.rdbuf() but it turned out to be massively slow, so while now it's using std::cout, I think we will have to keep the string conversion.

I've tried to use `stringstream.rdbuf()` but it turned out to be massively slow, so while now it's using `std::cout`, I think we will have to keep the string conversion.

Miguel Pozo force-pushed pull-gpu-profile from eba4e6d5e7 to 3fefa782e1

2024-01-16 17:41:11 +01:00

Compare

Miguel Pozo commented

2024-01-16 18:21:27 +01:00

I've updated the PR with several changes:

The PROFILE_DEBUG_GROUPS define has been replaced with a new --profile-gpu startup flag.
The MAX_STACK_DEPTH compile time option has been replaced by a profile_gpu_level wich can be optionally specified with --profile-gpu <level>. Higher levels mean more detail. If no level is specified then the value is set to INT_MAX and every debug group is profiled.
The level of a debug group can be set in GPU_debug_group_begin, but I've set it up in a way that it's handled almost automatically. There are 4 levels:
- ROOT (0) : Used by default by direct calls to GPU_debug_group_begin and DRW_stats_group/query_start.
- PASS (1) : Used by default by the Draw Manager passes.
- SUBPASS (2) : Used by default by the Draw Manager sub-passes.
- RESOURCE_SUBPASS (3) : This is meant to be used by per-object/mesh/material specific subpasses and must be set manually.
It's not perfect, but I'd say it's ok enough and better than having to handle them manually. I'm open to feedback, though.

I've put every change in a separate commit so it's easier to read.

Note that right now there's an issue with debug groups begin/end mismatches in wm_draw_window_offscreen so you'll see a lot of Profile GPU error: Missing GPU_debug_group_end() call messages. In reality the calls are not really missing, but they are done after swapping context in ED_region_do_draw which triggers a context activation/deactivation.
Moving the process_frame_timings call to Context::end_frame fixes the issue, but then draw manager context never runs the process_frame_timings function, so I'm not sure yet how to handle it yet.

I've updated the PR with several changes: - The PROFILE_DEBUG_GROUPS define has been replaced with a new `--profile-gpu` startup flag. - The MAX_STACK_DEPTH compile time option has been replaced by a ` profile_gpu_level` wich can be optionally specified with `--profile-gpu <level>`. Higher levels mean more detail. If no level is specified then the value is set to INT_MAX and every debug group is profiled. - The level of a debug group can be set in `GPU_debug_group_begin`, but I've set it up in a way that it's handled almost automatically. There are 4 levels: * ROOT (0) : Used by default by direct calls to `GPU_debug_group_begin` and `DRW_stats_group/query_start`. * PASS (1) : Used by default by the Draw Manager passes. * SUBPASS (2) : Used by default by the Draw Manager sub-passes. * RESOURCE_SUBPASS (3) : This is meant to be used by per-object/mesh/material specific subpasses and must be set manually. It's not perfect, but I'd say it's ok enough and better than having to handle them manually. I'm open to feedback, though. I've put every change in a separate commit so it's easier to read. Note that right now there's an issue with debug groups begin/end mismatches in `wm_draw_window_offscreen` so you'll see a lot of `Profile GPU error: Missing GPU_debug_group_end() call` messages. In reality the calls are not really missing, but they are done after swapping context in `ED_region_do_draw` which triggers a context activation/deactivation. Moving the `process_frame_timings` call to `Context::end_frame` fixes the issue, but then draw manager context never runs the `process_frame_timings` function, so I'm not sure yet how to handle it yet.

Miguel Pozo closed this pull request

2024-01-16 18:21:29 +01:00

Miguel Pozo reopened this pull request

2024-01-16 18:21:37 +01:00

Miguel Pozo commented

2024-01-16 19:07:32 +01:00

I've moved process_frame_timings to end_frame and added calls to begin/end_frame in DRW_gpu_context_enable/disable_ex.
It seems to be working fine, but I have no idea if there's any gotcha I should take into account.

I've moved `process_frame_timings` to `end_frame` and added calls to `begin/end_frame` in `DRW_gpu_context_enable/disable_ex`. It seems to be working fine, but I have no idea if there's any gotcha I should take into account.

Miguel Pozo referenced this pull request

2024-01-17 18:35:10 +01:00

EEVEE-Next: Performance Analysis #117246

Jeroen Bakker reviewed 2024-01-18 10:06:05 +01:00

Jeroen Bakker left a comment

Overall I am fine with this.

This is an OpenGL only feature, so we should hide this option in apple builds for now.
Reporting is currently a backend specific implementation. Would like to introduce something in gpu/intern. For now it is fine to add a GPUProfileReport class in gpu/intern which only does the printing. add_profiling_row, print_report. add_row would already update the stringstream. report, will only print the stringstream to console.

Vulkan implementation is similar, also uses query pools to track timings (vkCmdWriteTimestamp) The timestamp data type is platform specific, and might become a union at that point.

Overall I am fine with this. * This is an OpenGL only feature, so we should hide this option in apple builds for now. * Reporting is currently a backend specific implementation. Would like to introduce something in gpu/intern. For now it is fine to add a GPUProfileReport class in gpu/intern which only does the printing. add_profiling_row, print_report. add_row would already update the stringstream. report, will only print the stringstream to console. Vulkan implementation is similar, also uses query pools to track timings (vkCmdWriteTimestamp) The timestamp data type is platform specific, and might become a union at that point.

source/blender/gpu/opengl/gl_debug.cc Outdated

						
				@ -387,0 +473,4 @@

				      break;

				    }

				    std::stringstream result;

Jeroen Bakker commented

2024-01-18 09:50:25 +01:00

When we want to include this to other backend this will require to copy the reporting style as well.
eventually we should provide the reporting structure in gpu/intern and fill it.
Yes it will introduce another level, but improve that other tools can be written around it.

There is still an idea to have a performance area in blender.

When we want to include this to other backend this will require to copy the reporting style as well. eventually we should provide the reporting structure in gpu/intern and fill it. Yes it will introduce another level, but improve that other tools can be written around it. There is still an idea to have a performance area in blender.

source/creator/creator_args.cc Outdated

						
				@ -689,6 +689,7 @@ static void print_help(bArgs *ba, bool all)

				  PRINT("\n");

				  PRINT("GPU Options:\n");

				  BLI_args_print_arg_doc(ba, "--gpu-backend");

				  BLI_args_print_arg_doc(ba, "--profile-gpu");

Jeroen Bakker commented

2024-01-18 09:43:36 +01:00

--debug-gpu-profile-level would be more in line with the other options.

`--debug-gpu-profile-level` would be more in line with the other options.

Miguel Pozo commented

2024-01-22 17:12:13 +01:00

Is this a debug feature, though? "debug-gpu" kind of implies a debug GPU context, which this doesn't use (and shouldn't!).
For level, I didn't include it in the name because the level is optional (blender --profile-gpu-level seems weird to me).
Making it non-optional would make it more similar to other options, but IMO is more useful this way.

Is this a debug feature, though? "debug-gpu" kind of implies a debug GPU context, which this doesn't use (and shouldn't!). For level, I didn't include it in the name because the level is optional (`blender --profile-gpu-level` seems weird to me). Making it non-optional would make it more similar to other options, but IMO is more useful this way.

Miguel Pozo force-pushed pull-gpu-profile from f609be982d to 92fcbc4d9f

2024-01-22 16:58:29 +01:00

Compare

Miguel Pozo force-pushed pull-gpu-profile from 92fcbc4d9f to 77dc78e481

2024-01-22 16:59:49 +01:00

Compare

Miguel Pozo added 1 commit 2024-01-22 17:07:22 +01:00

04114832cd Disable --profile-gpu on mac builds

Miguel Pozo commented

2024-01-22 17:16:19 +01:00

I've moved the formatting and printing to a separate class.
It would be nice to move more things to the gpu::Context itself (like the level checking and the begin/end mismatches), but I'm not sure how to do it in a clean way, especially not knowing the requirement for other backends.

I've moved the formatting and printing to a separate class. It would be nice to move more things to the `gpu::Context` itself (like the level checking and the begin/end mismatches), but I'm not sure how to do it in a clean way, especially not knowing the requirement for other backends.

Miguel Pozo added 1 commit 2024-02-08 20:50:23 +01:00

e35f8de004 Merge branch 'main' into pull-gpu-profile

Miguel Pozo added 1 commit 2024-02-16 16:12:45 +01:00

17cfdd01c9 Merge branch 'main' into pull-gpu-profile

Miguel Pozo requested review from Clément Foucault 2024-02-26 12:54:44 +01:00

Jeroen Bakker commented

2024-02-29 19:24:41 +01:00

I did some more research on profiling and come to a different conclusion I had compared to last month. My take on this is that making a GPU profiler with meaningful values is very hard.
Without good understanding of how a driver schedules the different tasks to the GPU and which tasks are already running could lead to less information. Depending on what is expected I would like to see in the task description what is actually being measured and ensured to be correct.

For Vulkan/metal backend where we have more control about the scheduling it makes more sense to use a different timing method.
Metal has the option to use serialize exectution so you know for sure what is being measured. For vulkan we might need to introduce a different scheduling method to ensure that what is measured makes sense. Which one will depend on the actual needs.

My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview. If I want to dive deeper I select one shader and go to Metal to get overview of what is happening in the shader on a per line level. Or use RDP for indepth analysis.

Due to misunderstanding of what will actually be measured I might not use this patch. One big benefit of this patch is to track better what is going on on a specific system where we don't have access to. I am just not sure if the costs of this patch outweighs the benefit. Especially as GPU tasks are normally executed out of order. In order execution has already performance loss.

I did some more research on profiling and come to a different conclusion I had compared to last month. My take on this is that making a GPU profiler with meaningful values is very hard. Without good understanding of how a driver schedules the different tasks to the GPU and which tasks are already running could lead to less information. Depending on what is expected I would like to see in the task description what is actually being measured and ensured to be correct. For Vulkan/metal backend where we have more control about the scheduling it makes more sense to use a different timing method. Metal has the option to use serialize exectution so you know for sure what is being measured. For vulkan we might need to introduce a different scheduling method to ensure that what is measured makes sense. Which one will depend on the actual needs. My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview. If I want to dive deeper I select one shader and go to Metal to get overview of what is happening in the shader on a per line level. Or use RDP for indepth analysis. Due to misunderstanding of what will actually be measured I might not use this patch. One big benefit of this patch is to track better what is going on on a specific system where we don't have access to. I am just not sure if the costs of this patch outweighs the benefit. Especially as GPU tasks are normally executed out of order. In order execution has already performance loss.

Miguel Pozo commented

2024-02-29 20:14:28 +01:00

My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview.

Not in my experience. I often see timings that don't match the actual Blender runtime performance at all.
And it's not surprising given that Renderdoc runs in a very different context (a single frame, on debug mode, without the application CPU overhead...).

The precision of performance queries may be far from perfect but, at least in my experience, they roughly match the real thing.
This patch already has been good enough to detect and fix several performance issues.

This also has the advantage of having a really low iteration time for testing changes, you can just add the flag to your IDE, make a change and hit "run" to see the results.

> My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview. Not in my experience. I often see timings that don't match the actual Blender runtime performance at all. And it's not surprising given that Renderdoc runs in a very different context (a single frame, on debug mode, without the application CPU overhead...). The precision of performance queries may be far from perfect but, at least in my experience, they roughly match the real thing. This patch already has been good enough to [detect and fix several performance issues](https://projects.blender.org/blender/blender/issues/117246). This also has the advantage of having a really low iteration time for testing changes, you can just add the flag to your IDE, make a change and hit "run" to see the results.

Clément Foucault requested changes 2024-03-18 18:19:28 +01:00

Clément Foucault left a comment

Sorry for taking so much time to review.

I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph.
I would suggest writing to a single file in working directory. This would be much easier to search than the console output and surely more efficient to flush. And also easier for users to share on reports.
I would also suggest outputting something that can be fed into a flame-graph utility (I didn't do much search about this). This way you only have to record once and can easily search the frame with visual cue, and we don't have care to add much formatting. Although I am not sure which format is the simplest and what tool is the best and most wildly available.

Given the last point might require quite more work, I would accept this patch without it.

Sorry for taking so much time to review. - I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph. - I would suggest writing to a single file in working directory. This would be much easier to search than the console output and surely more efficient to flush. And also easier for users to share on reports. - I would also suggest outputting something that can be fed into a flame-graph utility (I didn't do much search about this). This way you only have to record once and can easily search the frame with visual cue, and we don't have care to add much formatting. Although I am not sure which format is the simplest and what tool is the best and most wildly available. Given the last point might require quite more work, I would accept this patch without it.

source/blender/gpu/intern/gpu_profile_report.hh

						
				@ -0,0 +18,4 @@

				    report << std::to_string(gpu_total_time).substr(0, 4) << " | ";

				    report << std::to_string(cpu_total_time).substr(0, 4) << " | \n";

				  }

				  void add_group(

Clément Foucault commented

2024-03-18 18:14:01 +01:00

Missing new line between function.

Miguel Pozo commented

2024-03-18 18:33:54 +01:00

I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph.

#116304 (comment)

😑😑😑

I'll look into the flamegraph thing. Sounds like a good idea.

> I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph. https://projects.blender.org/blender/blender/pulls/116304#issuecomment-1090465 😑😑😑 I'll look into the flamegraph thing. Sounds like a good idea.

❤️ 1

This pull request has changes conflicting with the target branch.

source/blender/draw/engines/eevee_next/eevee_pipeline.cc
source/blender/draw/intern/draw_curves.cc
source/blender/draw/intern/draw_manager_c.cc
source/blender/draw/intern/draw_manager_exec.cc
source/blender/draw/intern/draw_volume.cc
source/blender/gpu/CMakeLists.txt
source/blender/gpu/GPU_debug.h
source/blender/gpu/opengl/gl_debug.cc

View command line instructions.

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u pull-gpu-profile:pragma37-pull-gpu-profile

git checkout pragma37-pull-gpu-profile

Sign in to join this conversation.

No reviewers

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

GPU: Add PROFILE_DEBUG_GROUPS #116304

Checkout