WIP: PROFILE_DEBUG_GROUPS #116304

Closed
Miguel Pozo wants to merge 15 commits from pragma37/blender:pull-gpu-profile into main

When changing the target branch, be careful to rebase the branch in your fork to match. See documentation.
Member

There are currently no plans to merge this PR, but I'm leaving it open since it's useful to have this code available and "semi-up-to-date" for performance testing.

Add an option to profile GPU, CPU and Latency timings of GPU debug groups.
This is what the printed info looks like:

 Group                          | GPU  | CPU  | Latency
--------------------------------|------|------|--------
 Total                          | 9.15 | 3.62 |
. Manager.end_sync              | 0.16 | 0.12 | 1.29
. Workbench                     | 5.16 | 1.26 | 1.35
.. View.compute_visibility      | 0.04 | 0.04 | 3.91
.. DrawMultiBuf.bind            | 0.04 | 0.02 | 3.90
.. Opaque.Gbuffer               | 0.33 | 0.24 | 3.90
... MeshMaterial                | 0.31 | 0.04 | 3.87
... MeshTexture                 | 0.00 | 0.01 | 4.13
... CurvesMaterial              | 0.00 | 0.01 | 4.10
... CurvesTexture               | 0.00 | 0.01 | 4.08
... PointCloudMaterial          | 0.00 | 0.01 | 4.05
... PointCloudTexture           | 0.00 | 0.01 | 4.02
.. Opaque.Deferred              | 0.25 | 0.03 | 3.96
.. Workbench.Outline            | 0.23 | 0.02 | 4.17
.. TAA.Accumulation             | 0.47 | 0.02 | 4.36
.. SMAA.Resolve                 | 0.31 | 0.02 | 4.98
. Overlay                       | 0.87 | 0.97 | 5.24
.. psl->background_ps           | 0.20 | 0.03 | 5.22
.. psl->fade_ps[i]              | 0.00 | 0.01 | 5.38
.. psl->facing_ps[i]            | 0.00 | 0.01 | 5.35
.. psl->extra_blend_ps          | 0.00 | 0.02 | 5.32
.. psl->wireframe_ps            | 0.00 | 0.02 | 5.30
.. *p_armature_trans_ps         | 0.00 | 0.02 | 5.26
.. *p_armature_ps               | 0.00 | 0.04 | 5.23
.. psl->particle_ps             | 0.00 | 0.01 | 5.18
.. psl->metaball_ps[i]          | 0.00 | 0.01 | 5.14
.. *p_extra_ps                  | 0.00 | 0.03 | 5.12
.. psl->fade_ps[i]              | 0.00 | 0.01 | 5.08
.. psl->facing_ps[i]            | 0.00 | 0.01 | 5.05
.. psl->wireframe_xray_ps       | 0.00 | 0.01 | 5.02
.. *p_armature_trans_ps         | 0.00 | 0.01 | 4.99
.. *p_armature_ps               | 0.00 | 0.03 | 4.96
.. *p_extra_ps                  | 0.00 | 0.03 | 4.91
.. psl->metaball_ps[i]          | 0.00 | 0.03 | 4.85
.. psl->motion_paths_ps         | 0.00 | 0.02 | 4.79
.. psl->extra_grid_ps           | 0.00 | 0.01 | 4.75
.. psl->extra_centers_ps        | 0.00 | 0.03 | 4.71
.. psl->antialiasing_ps         | 0.54 | 0.03 | 4.66
. DebugDraw                     | 2.56 | 0.90 | 5.29
.. Lines                        | 2.52 | 0.68 | 5.26
... GPU                         | 0.02 | 0.04 | 7.17
... CPU                         | 0.02 | 0.02 | 7.12
.. Prints                       | 0.03 | 0.13 | 7.08
... GPU                         | 0.02 | 0.02 | 7.05
... CPU                         | 0.00 | 0.02 | 7.01
. RegionInfo                    | 0.00 | 0.03 | 6.85

These are way more accurate than the timings provided by RenderDoc.
And, while all GPU vendors provide their own profilers, I think a built-in option like this can be convenient.
(I've personally not been able to get the Nvidia Nsight profiler to work correctly, despite the debugger working fine)

At the moment, this is enabled at compile time by setting the PROFILE_DEBUG_GROUPS macro to 1,
but I think a command line option would make more sense. (?)

(This only includes the OpenGL implementation).

There are currently no plans to merge this PR, but I'm leaving it open since it's useful to have this code available and "semi-up-to-date" for performance testing. Add an option to profile GPU, CPU and Latency timings of GPU debug groups. This is what the printed info looks like: ``` Group | GPU | CPU | Latency --------------------------------|------|------|-------- Total | 9.15 | 3.62 | . Manager.end_sync | 0.16 | 0.12 | 1.29 . Workbench | 5.16 | 1.26 | 1.35 .. View.compute_visibility | 0.04 | 0.04 | 3.91 .. DrawMultiBuf.bind | 0.04 | 0.02 | 3.90 .. Opaque.Gbuffer | 0.33 | 0.24 | 3.90 ... MeshMaterial | 0.31 | 0.04 | 3.87 ... MeshTexture | 0.00 | 0.01 | 4.13 ... CurvesMaterial | 0.00 | 0.01 | 4.10 ... CurvesTexture | 0.00 | 0.01 | 4.08 ... PointCloudMaterial | 0.00 | 0.01 | 4.05 ... PointCloudTexture | 0.00 | 0.01 | 4.02 .. Opaque.Deferred | 0.25 | 0.03 | 3.96 .. Workbench.Outline | 0.23 | 0.02 | 4.17 .. TAA.Accumulation | 0.47 | 0.02 | 4.36 .. SMAA.Resolve | 0.31 | 0.02 | 4.98 . Overlay | 0.87 | 0.97 | 5.24 .. psl->background_ps | 0.20 | 0.03 | 5.22 .. psl->fade_ps[i] | 0.00 | 0.01 | 5.38 .. psl->facing_ps[i] | 0.00 | 0.01 | 5.35 .. psl->extra_blend_ps | 0.00 | 0.02 | 5.32 .. psl->wireframe_ps | 0.00 | 0.02 | 5.30 .. *p_armature_trans_ps | 0.00 | 0.02 | 5.26 .. *p_armature_ps | 0.00 | 0.04 | 5.23 .. psl->particle_ps | 0.00 | 0.01 | 5.18 .. psl->metaball_ps[i] | 0.00 | 0.01 | 5.14 .. *p_extra_ps | 0.00 | 0.03 | 5.12 .. psl->fade_ps[i] | 0.00 | 0.01 | 5.08 .. psl->facing_ps[i] | 0.00 | 0.01 | 5.05 .. psl->wireframe_xray_ps | 0.00 | 0.01 | 5.02 .. *p_armature_trans_ps | 0.00 | 0.01 | 4.99 .. *p_armature_ps | 0.00 | 0.03 | 4.96 .. *p_extra_ps | 0.00 | 0.03 | 4.91 .. psl->metaball_ps[i] | 0.00 | 0.03 | 4.85 .. psl->motion_paths_ps | 0.00 | 0.02 | 4.79 .. psl->extra_grid_ps | 0.00 | 0.01 | 4.75 .. psl->extra_centers_ps | 0.00 | 0.03 | 4.71 .. psl->antialiasing_ps | 0.54 | 0.03 | 4.66 . DebugDraw | 2.56 | 0.90 | 5.29 .. Lines | 2.52 | 0.68 | 5.26 ... GPU | 0.02 | 0.04 | 7.17 ... CPU | 0.02 | 0.02 | 7.12 .. Prints | 0.03 | 0.13 | 7.08 ... GPU | 0.02 | 0.02 | 7.05 ... CPU | 0.00 | 0.02 | 7.01 . RegionInfo | 0.00 | 0.03 | 6.85 ``` These are way more accurate than the timings provided by RenderDoc. And, while all GPU vendors provide their own profilers, I think a built-in option like this can be convenient. (I've personally not been able to get the Nvidia Nsight profiler to work correctly, despite the debugger working fine) At the moment, this is enabled at compile time by setting the `PROFILE_DEBUG_GROUPS` macro to 1, but I think a command line option would make more sense. (?) (This only includes the OpenGL implementation).
Miguel Pozo added the
Module
EEVEE & Viewport
label 2023-12-18 17:13:20 +01:00
Miguel Pozo requested review from Jeroen Bakker 2023-12-18 17:14:00 +01:00
Miguel Pozo requested review from Clément Foucault 2023-12-18 17:14:00 +01:00
Clément Foucault requested changes 2023-12-18 22:11:30 +01:00
Dismissed
@ -101,0 +107,4 @@
float cpu_time;
};
struct FrameQueries {
Vector<TimeQuery> queries;

Use blender::Stack<TimeQuery>. I believe this would simplify the implementation.

Use `blender::Stack<TimeQuery>`. I believe this would simplify the implementation.
Author
Member

Keep in mind it wouldn't be possible to just pop TimeQueries from the stack on debug_group_end.
They need to stay there until the query is actually available.

Keep in mind it wouldn't be possible to just pop `TimeQueries` from the stack on `debug_group_end`. They need to stay there until the query is actually available.
fclem marked this conversation as resolved
@ -367,2 +368,4 @@
* \{ */
#define PROFILE_DEBUG_GROUPS 0
#define MAX_DEBUG_GROUPS_STACK_DEPTH 8

Why use a hardcoded max stack depth?

Why use a hardcoded max stack depth?
Author
Member

I did this initially for Workbench, so I added the max depth to remove the per-texture debug groups.
This doesn't work that well for EEVEE-Next, though, since per-material sub-passes have different depths depending on the pass.
I think it may make more sense to be able to set some kind of per-sub-pass debug granularity level, so those can be skipped?

I did this initially for Workbench, so I added the max depth to remove the per-texture debug groups. This doesn't work that well for EEVEE-Next, though, since per-material sub-passes have different depths depending on the pass. I think it may make more sense to be able to set some kind of per-sub-pass debug granularity level, so those can be skipped?

Maybe we could tag sub-passes differently in the debug stack and these would be excluded from the stats tree. I think that would be more elegant solution.

Maybe we could tag sub-passes differently in the debug stack and these would be excluded from the stats tree. I think that would be more elegant solution.
mano-wii marked this conversation as resolved
Jeroen Bakker reviewed 2023-12-19 12:42:37 +01:00
Jeroen Bakker left a comment
Member

I think a compile time option is fine. Using a startup flag would be useful for performance measurements on devices that we don't have access to and for some cases it can quickly point to issues and platform differences.

Using a command line argument eg --debug-gpu-timings would not be useful as you need to know what the user has been doing together with the specific frame timings and what are actually running in the background etc.

I think a compile time option is fine. Using a startup flag would be useful for performance measurements on devices that we don't have access to and for some cases it can quickly point to issues and platform differences. Using a command line argument eg `--debug-gpu-timings` would not be useful as you need to know what the user has been doing together with the specific frame timings and what are actually running in the background etc.
Author
Member

@Jeroen-Bakker Aren't startup flags and command line arguments the same?

@Jeroen-Bakker Aren't startup flags and command line arguments the same?
Member

Yes, sorry for the confusion.

Yes, sorry for the confusion.
Clément Foucault requested changes 2023-12-24 22:52:46 +01:00
Dismissed
Clément Foucault left a comment
Member

I agree with Jeroen. This should be available on release builds with either a debug option in the UI or a launch argument.

I agree with Jeroen. This should be available on release builds with either a debug option in the UI or a launch argument.
@ -100,1 +100,4 @@
struct TimeQuery {
std::string name;
GLuint handles[2];

Call it start and end. Took me a bit to understood.

Call it `start` and `end`. Took me a bit to understood.
pragma37 marked this conversation as resolved
@ -377,0 +391,4 @@
glGetInteger64v(GL_TIMESTAMP, &query.cpu_start);
/* Use GL_TIMESTAMP instead of GL_ELAPSED_TIME to support nested debug groups */
glGenQueries(2, query.handles);

I'm wondering if generating queries in bulk would be a better idea for performance.
But given its at most a hundred of these, I think it is fine.

I'm wondering if generating queries in bulk would be a better idea for performance. But given its at most a hundred of these, I think it is fine.
@ -389,0 +482,4 @@
<< "\n";
}
std::string print = result.str();

Can't you output everything to std::cout instead? Creating a std::string for that seems quite convoluted just to use printf.

Can't you output everything to `std::cout` instead? Creating a `std::string` for that seems quite convoluted just to use `printf`.
Author
Member

I've tried to use stringstream.rdbuf() but it turned out to be massively slow, so while now it's using std::cout, I think we will have to keep the string conversion.

I've tried to use `stringstream.rdbuf()` but it turned out to be massively slow, so while now it's using `std::cout`, I think we will have to keep the string conversion.
Miguel Pozo force-pushed pull-gpu-profile from eba4e6d5e7 to 3fefa782e1 2024-01-16 17:41:11 +01:00 Compare
Author
Member

I've updated the PR with several changes:

  • The PROFILE_DEBUG_GROUPS define has been replaced with a new --profile-gpu startup flag.

  • The MAX_STACK_DEPTH compile time option has been replaced by a profile_gpu_level wich can be optionally specified with --profile-gpu <level>. Higher levels mean more detail. If no level is specified then the value is set to INT_MAX and every debug group is profiled.

  • The level of a debug group can be set in GPU_debug_group_begin, but I've set it up in a way that it's handled almost automatically. There are 4 levels:

    • ROOT (0) : Used by default by direct calls to GPU_debug_group_begin and DRW_stats_group/query_start.
    • PASS (1) : Used by default by the Draw Manager passes.
    • SUBPASS (2) : Used by default by the Draw Manager sub-passes.
    • RESOURCE_SUBPASS (3) : This is meant to be used by per-object/mesh/material specific subpasses and must be set manually.

    It's not perfect, but I'd say it's ok enough and better than having to handle them manually. I'm open to feedback, though.

I've put every change in a separate commit so it's easier to read.

Note that right now there's an issue with debug groups begin/end mismatches in wm_draw_window_offscreen so you'll see a lot of Profile GPU error: Missing GPU_debug_group_end() call messages. In reality the calls are not really missing, but they are done after swapping context in ED_region_do_draw which triggers a context activation/deactivation.
Moving the process_frame_timings call to Context::end_frame fixes the issue, but then draw manager context never runs the process_frame_timings function, so I'm not sure yet how to handle it yet.

I've updated the PR with several changes: - The PROFILE_DEBUG_GROUPS define has been replaced with a new `--profile-gpu` startup flag. - The MAX_STACK_DEPTH compile time option has been replaced by a ` profile_gpu_level` wich can be optionally specified with `--profile-gpu <level>`. Higher levels mean more detail. If no level is specified then the value is set to INT_MAX and every debug group is profiled. - The level of a debug group can be set in `GPU_debug_group_begin`, but I've set it up in a way that it's handled almost automatically. There are 4 levels: * ROOT (0) : Used by default by direct calls to `GPU_debug_group_begin` and `DRW_stats_group/query_start`. * PASS (1) : Used by default by the Draw Manager passes. * SUBPASS (2) : Used by default by the Draw Manager sub-passes. * RESOURCE_SUBPASS (3) : This is meant to be used by per-object/mesh/material specific subpasses and must be set manually. It's not perfect, but I'd say it's ok enough and better than having to handle them manually. I'm open to feedback, though. I've put every change in a separate commit so it's easier to read. Note that right now there's an issue with debug groups begin/end mismatches in `wm_draw_window_offscreen` so you'll see a lot of `Profile GPU error: Missing GPU_debug_group_end() call` messages. In reality the calls are not really missing, but they are done after swapping context in `ED_region_do_draw` which triggers a context activation/deactivation. Moving the `process_frame_timings` call to `Context::end_frame` fixes the issue, but then draw manager context never runs the `process_frame_timings` function, so I'm not sure yet how to handle it yet.
Miguel Pozo closed this pull request 2024-01-16 18:21:29 +01:00
Miguel Pozo reopened this pull request 2024-01-16 18:21:37 +01:00
Author
Member

I've moved process_frame_timings to end_frame and added calls to begin/end_frame in DRW_gpu_context_enable/disable_ex.
It seems to be working fine, but I have no idea if there's any gotcha I should take into account.

I've moved `process_frame_timings` to `end_frame` and added calls to `begin/end_frame` in `DRW_gpu_context_enable/disable_ex`. It seems to be working fine, but I have no idea if there's any gotcha I should take into account.
Jeroen Bakker reviewed 2024-01-18 10:06:05 +01:00
Jeroen Bakker left a comment
Member

Overall I am fine with this.

  • This is an OpenGL only feature, so we should hide this option in apple builds for now.
  • Reporting is currently a backend specific implementation. Would like to introduce something in gpu/intern. For now it is fine to add a GPUProfileReport class in gpu/intern which only does the printing. add_profiling_row, print_report. add_row would already update the stringstream. report, will only print the stringstream to console.

Vulkan implementation is similar, also uses query pools to track timings (vkCmdWriteTimestamp) The timestamp data type is platform specific, and might become a union at that point.

Overall I am fine with this. * This is an OpenGL only feature, so we should hide this option in apple builds for now. * Reporting is currently a backend specific implementation. Would like to introduce something in gpu/intern. For now it is fine to add a GPUProfileReport class in gpu/intern which only does the printing. add_profiling_row, print_report. add_row would already update the stringstream. report, will only print the stringstream to console. Vulkan implementation is similar, also uses query pools to track timings (vkCmdWriteTimestamp) The timestamp data type is platform specific, and might become a union at that point.
@ -387,0 +473,4 @@
break;
}
std::stringstream result;
Member

When we want to include this to other backend this will require to copy the reporting style as well.
eventually we should provide the reporting structure in gpu/intern and fill it.
Yes it will introduce another level, but improve that other tools can be written around it.

There is still an idea to have a performance area in blender.

When we want to include this to other backend this will require to copy the reporting style as well. eventually we should provide the reporting structure in gpu/intern and fill it. Yes it will introduce another level, but improve that other tools can be written around it. There is still an idea to have a performance area in blender.
@ -689,6 +689,7 @@ static void print_help(bArgs *ba, bool all)
PRINT("\n");
PRINT("GPU Options:\n");
BLI_args_print_arg_doc(ba, "--gpu-backend");
BLI_args_print_arg_doc(ba, "--profile-gpu");
Member

--debug-gpu-profile-level would be more in line with the other options.

`--debug-gpu-profile-level` would be more in line with the other options.
Author
Member

Is this a debug feature, though? "debug-gpu" kind of implies a debug GPU context, which this doesn't use (and shouldn't!).
For level, I didn't include it in the name because the level is optional (blender --profile-gpu-level seems weird to me).
Making it non-optional would make it more similar to other options, but IMO is more useful this way.

Is this a debug feature, though? "debug-gpu" kind of implies a debug GPU context, which this doesn't use (and shouldn't!). For level, I didn't include it in the name because the level is optional (`blender --profile-gpu-level` seems weird to me). Making it non-optional would make it more similar to other options, but IMO is more useful this way.
Miguel Pozo force-pushed pull-gpu-profile from f609be982d to 92fcbc4d9f 2024-01-22 16:58:29 +01:00 Compare
Miguel Pozo force-pushed pull-gpu-profile from 92fcbc4d9f to 77dc78e481 2024-01-22 16:59:49 +01:00 Compare
Miguel Pozo added 1 commit 2024-01-22 17:07:22 +01:00
Author
Member

I've moved the formatting and printing to a separate class.
It would be nice to move more things to the gpu::Context itself (like the level checking and the begin/end mismatches), but I'm not sure how to do it in a clean way, especially not knowing the requirement for other backends.

I've moved the formatting and printing to a separate class. It would be nice to move more things to the `gpu::Context` itself (like the level checking and the begin/end mismatches), but I'm not sure how to do it in a clean way, especially not knowing the requirement for other backends.
Miguel Pozo added 1 commit 2024-02-08 20:50:23 +01:00
Miguel Pozo added 1 commit 2024-02-16 16:12:45 +01:00
Miguel Pozo requested review from Clément Foucault 2024-02-26 12:54:44 +01:00
Member

I did some more research on profiling and come to a different conclusion I had compared to last month. My take on this is that making a GPU profiler with meaningful values is very hard.
Without good understanding of how a driver schedules the different tasks to the GPU and which tasks are already running could lead to less information. Depending on what is expected I would like to see in the task description what is actually being measured and ensured to be correct.

For Vulkan/metal backend where we have more control about the scheduling it makes more sense to use a different timing method.
Metal has the option to use serialize exectution so you know for sure what is being measured. For vulkan we might need to introduce a different scheduling method to ensure that what is measured makes sense. Which one will depend on the actual needs.

My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview. If I want to dive deeper I select one shader and go to Metal to get overview of what is happening in the shader on a per line level. Or use RDP for indepth analysis.

Due to misunderstanding of what will actually be measured I might not use this patch. One big benefit of this patch is to track better what is going on on a specific system where we don't have access to. I am just not sure if the costs of this patch outweighs the benefit. Especially as GPU tasks are normally executed out of order. In order execution has already performance loss.

I did some more research on profiling and come to a different conclusion I had compared to last month. My take on this is that making a GPU profiler with meaningful values is very hard. Without good understanding of how a driver schedules the different tasks to the GPU and which tasks are already running could lead to less information. Depending on what is expected I would like to see in the task description what is actually being measured and ensured to be correct. For Vulkan/metal backend where we have more control about the scheduling it makes more sense to use a different timing method. Metal has the option to use serialize exectution so you know for sure what is being measured. For vulkan we might need to introduce a different scheduling method to ensure that what is measured makes sense. Which one will depend on the actual needs. My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview. If I want to dive deeper I select one shader and go to Metal to get overview of what is happening in the shader on a per line level. Or use RDP for indepth analysis. Due to misunderstanding of what will actually be measured I might not use this patch. One big benefit of this patch is to track better what is going on on a specific system where we don't have access to. I am just not sure if the costs of this patch outweighs the benefit. Especially as GPU tasks are normally executed out of order. In order execution has already performance loss.
Author
Member

My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview.

Not in my experience. I often see timings that don't match the actual Blender runtime performance at all.
And it's not surprising given that Renderdoc runs in a very different context (a single frame, on debug mode, without the application CPU overhead...).

The precision of performance queries may be far from perfect but, at least in my experience, they roughly match the real thing.
This patch already has been good enough to detect and fix several performance issues.

This also has the advantage of having a really low iteration time for testing changes, you can just add the flag to your IDE, make a change and hit "run" to see the results.

> My current approach on getting timings is that the Timings from renderdoc (view -> Performance timers) contains a good overview. Not in my experience. I often see timings that don't match the actual Blender runtime performance at all. And it's not surprising given that Renderdoc runs in a very different context (a single frame, on debug mode, without the application CPU overhead...). The precision of performance queries may be far from perfect but, at least in my experience, they roughly match the real thing. This patch already has been good enough to [detect and fix several performance issues](https://projects.blender.org/blender/blender/issues/117246). This also has the advantage of having a really low iteration time for testing changes, you can just add the flag to your IDE, make a change and hit "run" to see the results.
Clément Foucault requested changes 2024-03-18 18:19:28 +01:00
Dismissed
Clément Foucault left a comment
Member

Sorry for taking so much time to review.

  • I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph.
  • I would suggest writing to a single file in working directory. This would be much easier to search than the console output and surely more efficient to flush. And also easier for users to share on reports.
  • I would also suggest outputting something that can be fed into a flame-graph utility (I didn't do much search about this). This way you only have to record once and can easily search the frame with visual cue, and we don't have care to add much formatting. Although I am not sure which format is the simplest and what tool is the best and most wildly available.

Given the last point might require quite more work, I would accept this patch without it.

Sorry for taking so much time to review. - I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph. - I would suggest writing to a single file in working directory. This would be much easier to search than the console output and surely more efficient to flush. And also easier for users to share on reports. - I would also suggest outputting something that can be fed into a flame-graph utility (I didn't do much search about this). This way you only have to record once and can easily search the frame with visual cue, and we don't have care to add much formatting. Although I am not sure which format is the simplest and what tool is the best and most wildly available. Given the last point might require quite more work, I would accept this patch without it.
@ -0,0 +18,4 @@
report << std::to_string(gpu_total_time).substr(0, 4) << " | ";
report << std::to_string(cpu_total_time).substr(0, 4) << " | \n";
}
void add_group(

Missing new line between function.

Missing new line between function.
Author
Member

I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph.

#116304 (comment)

😑😑😑

I'll look into the flamegraph thing. Sounds like a good idea.

> I feel this is too invasive on the user-land code. I would remove the profile level and output the full graph. https://projects.blender.org/blender/blender/pulls/116304#issuecomment-1090465 😑😑😑 I'll look into the flamegraph thing. Sounds like a good idea.
Miguel Pozo added 2 commits 2024-06-20 20:57:52 +02:00
Miguel Pozo removed review request for Jeroen Bakker 2024-07-15 16:44:00 +02:00
Miguel Pozo requested review from Clément Foucault 2024-07-15 16:44:20 +02:00
Miguel Pozo removed review request for Clément Foucault 2024-07-15 16:44:23 +02:00
Miguel Pozo changed title from GPU: Add PROFILE_DEBUG_GROUPS to WIP: PROFILE_DEBUG_GROUPS 2024-07-15 16:45:22 +02:00
Clément Foucault approved these changes 2024-07-15 16:48:11 +02:00
Clément Foucault left a comment
Member

Test

Test
Miguel Pozo closed this pull request 2024-07-15 19:19:48 +02:00

Pull request closed

Sign in to join this conversation.
No reviewers
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset System
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Asset Browser Project
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#116304
No description provided.